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(54) HASH COMPACT XML PARSER 



(57)Abstract: 

PROBLEM TO BE SOLVED: To provide a method of parsing a markup 
language document including syntactic elements to improve a competing 
with a memory and a processing burden in an apparatus having hardware 
constrains. 

SOLUTION: A hash compact XML parser comprises a step 310 of 
identifying a type of an element for one of the syntactic elements, a 
step 318 of processing the element by determining a hash 
representation thereof if the type is a first type and a step 314 of 
augmenting an at least partial structural representation of the document 
using the hash representation if the type is the first type. 
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[fs*JS2] hsans-efc-sct^at-rsiit 

i \z&mo>mi?i?5&. 

'J V 7 U>X-f V^r -r V— * i-fctfFf 
us* I9^rai=tt#-r *«rSH/\ ^ 7 ;u =r 'J XA^ro 

B2 U 77 u>x. 

■5lirl2M-v->3.T;u^ , JXA'va>m3 u>x. 

It© t -T H*« i i:l5Sro««f 
[St#gi4] firiESgi ^-<^ii, 

3t^»mT"C?&-5 c <t t f -5>iS*JS 2 lc|S@«>« 

cTi:lzzL^-^?S3-K-C'li'5:<. StrfB-tSX^iif*. 
(i) l5ISiim<03t^», Xtf (iOI&iBgSiroX^COIl 

2 \z&mo>mifiJ5m. 

^(Dg1 >f >X^< >X-efc-5l98E-«lXStgi: (i i) 

mfiBmi >x«i;. tatem 2 -r>x 

x nc*^>Ti%sci:$^ai:-rsiS*«2iriesro« 



-> mm. z -t * ss 2 «aax x -> zr t , 
(i 1 ) itri5m2/w>igssfflL'»-c. iirEHX§0>'> 

ttriE«iax^»>^'jti;tiriHm2fflisx7 L -y^ , i*. isih- 

mi a>H«l-^-r«>m2 0||«A^ StFl5-gm<7)/\-y 

fjsate^mi*, $ST$?"c-&y. 
fcy. 

Iill2te^<0m2/\^*>3.S3SI*. ;ttf&/\-;/->J.**T$ 

^»T?a&y. 

itrtsis*6^ytii--eS)y. 

filfB/ n 'y v 3. §B«i $f y t S - 0) t fl). 

^-^ u- ^ 1= J: o x mm * *tfc / \ -y -> i IBSfe 9 7 1 m - 
2 (=iB®ro««f*aE„ 

[1S*«1 4] ItlfE-gSii&tffiriBteSStl*. -tn-T 
1!rEm2$'y'<TS«!iSL. >t*f5m2/N'y->iilte / /» 
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7 * V £f&m+ 3 Xt^ Zft . 
ftrffim2/\->->zListe/«7^y^ firism 1 * ^<-*7 

i ^-y^Trt-erot&E^s^XTro^x htiasiro*. 
xi-sstfcft. itne*tJ5m2/\-vv3.§i*fe/^T^y 

$su -t^(-«fcy> |trfBm2^^'<T{--Pfr(D^*i.-e 
nroffisg/ n -> mtk/m t s ^ * »j$ l . 
fUEiix-x^-;/^!*. itri5B2^^'<ri--DLNTa>ffris 

it. 

|trl5S5S/\'V*>a.g|Jfe/$lT^yii. m&m-i ZV't-T 

^xTWeOTfaES^s-y^Tco^-x h£isii0>*x h 

Ci§*3Sl 6] SiJiSii^XT^^lzglO-C. 
SSJett^i-y^iis ltrE-7— > r -y ^SgXSoitFffi 

mizm e. l t =f- x -v ? -r a x t? - 1 £ 

[Ef$«l 7] liJE&MfcWSrox^^l-^T. M 

I-. 

6 lc|E®C0fS«T7J^. 
Cit*3l 1 8 ] 15E35 1 * -f ^l*s 
SitgSiS&li^-O-SB.. 

C8t#JSl 9} firE8l&gtiili*?-T?£>£-i:£ ! l#fai: 

ekcds?*t^&. 

fir IE?- x «y £ X t- v ~?l£* 



$17 $ ?£i$»Jt S*7xf 

tSES 2 / \ -> a. m 7 $ <f *<S!TEm 1 / \ -> a. §8 

£-£tizt£&®k -r sit 2 o iz&mnmm 

[|f#JS2 2] fijES&W;SlcigiivT\ 
f!lE«S:MIiJl=«£^. ltfE3CSro|iTE'>'5: < ttSB^M 

if -5>i§#Jg2 1 lc|Em<0<S«r*^. 

[is#jg2 3] axasm=Ri€)Lr. |trE3t«rosirE^ 

llf*S2 4] flJESXmSlJI*. ^^'(DiiE'a^X hl= 

ffrE^x-v^xx^^ii. 

tilEX:SrotirE'>^: < t tS?»M<S«litS^i3lt*/\ 

•>•>zL^ly|cMyaMl±S{^^Tt^. ^*ti=«ty. iSEm 
«7^ y^pisij-r-s-y-^XT^ ^t. 

ffiE&3I/\-y->ii§te/*S7$?'*<. SUEmi / w->3- 

*^fc-t^at-r-&it^2 3 i-t5Kro<s«f^5£. 
[Ei^2 5] itrEaMit«(=«^r, 
firE3tsroiirE'>'& < t tspfl-wsmit^sucfcita-t- 

s 5 ^$si=#fc^ <t f •5it*ja 2 4 1= 

Emofg«r*^. 

[g$a2 6] KFES^tt^x-v^^^-y^tt. ssit 

(a) fi|ltt.*l. (b) Xli (c) S^gJit, 

(a) firEVRD^ffla-rsffia-y-^x^-y^s^. 

KffiS-y-^X^f -y^l±. VRDi;J3lt-5<0«IXS^c:i: 

(i) vRD<o|tiE«X^ro^-f ^$ia»j-r^-9-^Ht 
(li) ItTE^f ttrES 1 ^ -f ^roig^iz, fiUE 

«s:sms. f©nv>aaa£asn:ti:j:^t 

fiTESai4«SgXx->^li. Ml=. 

(b) «i3i**i.fcvRDi=Bae. LTtirE-7— 97vzrm 
mxvom&'jns; < t te?»Mtc«Jts^s^ x ^ r 

^l*. giftro#JS/N-v->:i&S£Sttl:rlt8-t3>^-9- 
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(a) 7-^7"^fSX8^. ^-CT-SStSi)*^* 

(i i ) flJKK*3.y>h$?pgJlf-fctt-i>tir<BK*i 
(i i i ) S5fEK*a*>h*'7*<fiI0>K*3.>>h£ 

(b) VRDf, -f-c-CglSiJi^S^-ywtlc. 

(i i) ^Sfr-r-s^-ypg^i^fcitsttrcD^ycD/N^vi 

[183*312 8] V R D ICBg & — 07 V ^SXS 

(a) 1irf2VRD£. *0>tj:fr-e&%\\2tltzmi&WmZ 

(i i i) t9iavRDro«jgs^i-fci+i, % fiffsetits 
*rofiiB/\ -J -> a. ggia *8*l§te £*&*a-t s x t- v -f 

(b) -ero^AN-esisijs^fc 



( i i i ) SUlSXSOUgJiSStKifcltSu fiJiHXWftii 

«t. 

[IS#]S2 9] ItrlHSfcMifcSX^-v^l^^r. HI-, 

3 o ] tvest i s -r ^i*. 

-r-5>^*^3 Olr|5KO«*f^„ 
[1331113 2] «t£g*£#t?-*— 5? T-V^SgS:S£ 

^xsskd $ zf *> * ^ •> ^<t . 

MJSicoE*gSSi£9lj£ "T <& - 1 . 
(i i i) ttE«XR**«J#-r*C4:. 

£ ^t; C £ £ #® <!: -f i> fc-SHb^i* . 

sts $ a^-r * a#**i? & o r . 
fries-sHbtJixgiii©— ^l-•^L^•r. 

( i i ) ItJE^-r^Sl ^-f ^T-'fel+^lS. SllE»# 

<b«xs«rosjss^£*^-rs-t. 
(i i i ) tn&&mtm*i&m£&ft-r&zt^ 

z # t? c t s «® t -r *a#^o 
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£>ffli31^a<!:. 

t. 

(i i i) ttfBK*i/>h*^*<Sira>K*i>>h$ 

*^roi&E&^/x-;/->ig§i£te#rf -s^at. £^ 
*K 

lirE&sgsiasi-. 

(b) vrds, ■s-c-eHHMSJh.as^rtic, mz<f 

^bjboss i **-cfciMB^i=. fljs-r&^a 
s*u &#ai*. 

«>fSt. 

tesa-f-s^at. 

(c) IJEK+iJ* > h^^0)lilSEi£3g/\-v->J.a^ 

iirE'Jx hrti=ft*fr. lirE'J* sro>>/< 

BEL. Hl:<k-5t, firE"?— 7 v ~?MBi'X&0>&2h 

»* «*r - t -r 

<z>s a it £ «§-r -a fc o r . 

( • i ) UE«5lg8t<0/N-;>->ig3!£»£-f &^a 

t. 

(i i i) 1<FEVRD(7>8itagi(::fclfS. «TG*&B 



(b) 7-«77niXf*. t«94*t»SISiifc 

*ai4. 

c i ) mExs&ii3^rom:*iii4£&£-f s^at. 
c i i ) i9EX»«Jss*«)M-yi/3.aa$as-r*# 
at. 

( i i i ) 1trEXS<omita^l=33lt-5>. HTEX*«3i 

sissroitrE/Nf ->aaisi:is:itt«»atsfat. 

«TE«tSSettMI=. 

(c) iaEXS©firE«>tasii::fcit.&ltrE#s:#«ii 
gSaMtXHtt&tf/X"? ->aSat. UTEVRDroSTE 
$ £8511- 1+ -5 ttfc&SJI 14» tf/ x ? v a &m t £ it 
IStU fWcitJ, 1&E7 — >7-^fgS:Sfl)SStt 

£fcg-f£¥a£#-t£> c t $®&k ?z>&&mm<, 

( i ) strE^-f ^*<$i $f ^r-fctttf. mE^xg^ 
( i i ) ItrE* -< :7*<sn •S'-f^-efci-Hxii. firEOX 

(i i i ) tn&mxwmz&ftT&zt. 
coo-r4x*M-«t-3-r. i(iE«s:s**«isr*^at. 

itrEft^b^x^mro^ -r ^msu-r-s^at. 

3tgf?§(Di£/x v 9iSf £> - i: . 

ib«is:giRro«aass^-rs - 1 s 
$ ^t? c t 1 1 z ta#ss. 

-?-0/x<>v/ia^$aiS-r-5cti=«»:-3T. 
ttiE^-r^mi ^-c^t?fc*ii^i-itiE/x-y->a.ss 
■r*=i-Kt. 
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[S*JS40] a>ea-9i:VRDl:ibLr7—> 

o.^ > K^^rirl-. 6 K*ajt>h* *•;!><. ttJfcfS 
&a-Kii. 

(i ) WEK*a.y>h*y«)l<IJBtt«*aS+*3 — 

(i i ) UEK+ay > K^^^BlzfelfSlirro K*a. 

KS^tt/v;/ a TOEK^a.* 
> K * y Ot£^/ n •;/ v a. SS!£ -S a - K £: . 

(i i i ) TOEK*a*>h$?"ft<TOCDK*a../>h* 
y«fcy t3g<*X M~*:h-£Jt£l:^ TOEK*a.*:>h 

8515=1 > tfa— Jr f=j Ali^lz. 

(b) VRDJ, -tC-C^SiJ*tt5^^cri:lz. 

K£*U 6a-KI±. 

( i ) TOE*yoiigiftim£jfe5£-f s=>- Kt. 
(i i) wtB-rs^^pgfii^fcitsstiaj^yrovN^i/a. 
asc:at£**ti>TOE* •yrot£§i/\-> yasisast 

(i i o frE^yo!)itriBt£3i/N-y*>j.as* yx m= 

TOEa >fcfa.— * ?uf^U\X£\^ 

(c) TOEK*a.>>h*?*©TOE&5I/\-;/->aSSI 
Jb<- TOE'JX hrti:*44». TOE 'J X >A 
0>fi3>Ery-:7-fev 

[18*314 1] 3>ea-5i|:VRD|;IbLt7— ? 

(a) TOEV RD$, -tfl)ft^-CK»J**lfc«SS*r 
«yBT*a — KSd*. Ka— Kl*. 

( i ) TOE*JSgS5«>8IS:gte£*jrf Sa- Kfc, 
( i i ) TOE«USS*<0'N^->iaS*»S-*-*3- K 

(i i i) TOEVRD©«lfiaSI=telt4. TOE#3ig 
*©TOE/\*i'.a.*S*tffl»tM1t a — K 

TOE^n^AliMI-. 

(b) 7-^7 7^fSXt£. *«Xt*-CK»IS4ifc 

6a- Kl*. 

( i ) TOEX#«3tSSSa)«XB1t*aS-r*a- K 



( i i ) HeXfiSSSOA7->aSS$^f4a 
— Kfc. 

( i i i ) TOEXStOtSitgSiKfcl+iu TOEXSSi®: 
S*OTOE/W>a.g3Ri:«lS:SttS1&»-r*3- K 

TOE^n^? Alicia, 
(c) TOEX»a>TOE«36gSI=fclt4TOE«-:Jt««t36 

s*ro«xMttau/Nf>a.as*^ toevrdottoe 
-t-sti^ia. TOE7— >7?^ti»»»att««n 

t4a-K*^t;Ct^ti:t4a>ea-$^B^ 

^»HX»*»*fl:-r **«**fT* a > tf a.- 

iXItro^-f^SSlSltSa- Kfc. 
( i ) TOE* -f ^jb*8 1 *<f ^"Cftiitf. tE«XSt 

«>a.a«£Sfe5£-f 5C £ . 
( i i ) TOE* f^*<mi *-f :/-C«Ht*Ui. TOEffiX 

ESS StSf * C £ . 
(i i i) WE«Wt«**«»f* = fc. 
<7Hvf*t*Mr«fco-C, TOE«XK**«ia-r*a-K 

-* 7 -v ^WBX»*«-r *****1t*-&* a > e 
a—* ^Q^AfiSoT. 
fiEMftSXSSro * -f zf£ MM* i> a - K t . 

( i ) TOE* -i i * -f :/-cfc*itf. TOE»-SHb« 
X^fSSOTiS/x v ->aSS^t4ct, 

( i i ) TOE*-f ?A«Sl *-f ^fclfttCf, TOE»# 

itfltxKxeiuuHre asr set, 

(i i i) »ffttx««t»t* = i. 

£ ^ t? c t £ fflk i: t S ^ > tf a. - * 9 a 9 ^ A „ 
[|§3fcJa4 4] avtfa.— SlafltXSJRfc-StTT— *7 
v ^»BX»*«MfT a >ea-* 
T-Q^^t »«ftLf=a>ea.-*WSIISE«:$^t;a 
>tfa.— S^O-^ASgaT-feoT, 
TOEaVtfa-^^D-y^Ali, 
TOEfllXMI©* W ^^ia»J-r€.a- Kt. 
»E?-fWSi*-f?"efc4«^i:. TOESIXglg 
^-(D/\->i/aSaSS3lS-r*wi:l=*or. jas-r 

iE5"f^Sl *-f ^T*feS^^iaTOE/\-v>a.SS 
Stt^i$SttitSa>ea-$^ay7Aaa. 
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<* * # ti => > tr i — * □ ^ =5 a §g a -e & o r . 
itr ss ^ > e 3. - $ ^ □ y ^ a ii . 

(a) "7— t7v7nmX&%. fCT-MIS<l«i K4r 
(i i ) 1irI5K*a>>H$yteJflcfcMf.5lira>K*i 

1515=1 > ea-$ ^P^^AIiSgl-, 

(b) vrds. ^zxmfttin.&zvz't^ mz? 

K£*U Ktt, 

(i) firf2$-7<Dl§®{ag£2fe5rf -5=>-K<!:. 

(i i) «(S-rS^^K^I=S3l+^ttl<D5»-yro/\-y->i 

-5=1- Kfc. 

(c) uitaK^rj.>>h^-fcoiiiiHat?i/\-v->jLSs 

[IS3fcJS46] □>ea-9l:VRDtSbLT7-9 

(* $ ^ t? =i > tf i - $ zf a V 7 A S Si t- fe o T * 

fiTK^P^AIi. 

(a) fftlEV RDS, *0fcfr-r*SE&J£;h.fc*$iitglitc: 
ti:. fflS-f*a-K*#^ Kp-KI*. 

( i i i ) EiTKVRD(Dffi3tgS!l-fctt-£K IHK^IJSS 

(b) -7— 57vZfnmXm&. *<DtefrX-$&»l$*itz 



-ft. 

s^©itriB/\-> -sx.mwitmicm&£ftM?%>^- k 

B&SE^P^AIiMI^ 

(c) l5ffi3tS<DBtrl2«Jt^l=i3I+i.BIH^S:S1tjt 

®m©183tSttXi;/\-y->iS^. miEVRDroiiriB 
mat SSIC *S 1+ -5 *Jtfr«S:mi4& IS-' \ >y -> ^ BM\z -St 

-r-5=i- K££fcC t Zftmt-t &=i > If a— * 

&ti< t*>m*tottmmm.m-z&-3x. 
t(rc«xs*«)rta)-s*i=« u 
itris-^o^ -r ^ussd?- -ssmijxt- -j -it . 

ats-f -5 d £ * o-c-trosit^&ai-r £*&s;*t- ? ^ 
t. 

nan* 4 zttftunm i $-f zfxt>*u$. hi5/\->vj.s 

zfto&mi37iiiz£-ox±f$.z*izzt£ft&t-f&m 

imam* 8i ixit^t7-f7-^fgxts 
m«i-r^mmmmx&-ox. 

-JuMvtt. ( i ) MIEXSt. ( i i ) DIEJttS 
«f«f-T 5^S£l!llE^P-b -v-y-l-llft^-t** ^P ^7 A 

S^p^^Aii. 

(i i ) ItFfB^-f ^A<mi ^-C ^t?ife5^l=. ttrS2« 

xmmz. *o>/\v>n.&m£dt%.-fzzt\z£^x. 
mmt&^-ft. 

(i i i ) ftriBW ^-efc-sii^(=|triE/\ 

S^t? z t £4#St <= -r 5M«fSe. 
[1B3RJ14 9] IXISJtfcV RDI=B3t»LT«X^ 

(a) ^P-b-y-9-.t. 

(b) ( i ) tHRttt. ( 1 i ) ItTEV ROt, (i 

i i) X«<05im£&§-f S^UfcftFiE^P-tr-y-tHc 

(c) ItlfiB^P^^AI*. 

(ca) 7-^7«^fiX«S, ^C-eK»J*Hi»K 
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U &a-h*ii. 
(c a a) 15EK*.a><> 
a- Kir. 

(c a b) StfiEK*i*>h$-?PiJII=asit-i>f&<OK* 

y > h 9 <? roiKSg/ s •> 3. SSI * -5 a - K t . 
(c a c) ftrE K+3.> > h $ ^MOT K*a.j< > 

(cb) VRDS, -t-C-e^iJ^H-S^y-irlr. |§$ 

— K**U &a-KI*. 
(c b a) |ttlE$^<D^eS*3i^-rS=l-K<t. 
(ebb) tt£^5$^|gJf t-fcl+5fiI<D$-?(D/\-;/v 

a. aa icae $ ti -suite* ? rottss/ \ v *> a. * *s 
(c b c) HE*-y«>itrfBi£3i/\v$/.3.as*'jx m= 

(c c) m&b**iJ*>h$?<&1nlK»3f/v;/->i£3i 
fJEyx hrtl=*$A\ sfcl*. lirE'JX KZ>*>A 

EEL. **U=«fcy»E-*— f 77^tSJtt«)SfattS 

CW*«5 0] IXIf tftb v R D & LtiXB 

(a) ^o-fey-y-t. 

(b) (i)WES:»i. (i i) IJEVRDt. (i 

i i) X«a«ai££&ar*¥H£1trE:/ci-k?-0-l= 

L- 

(c) lUE^a^^AI*. 

(c a) s5tevrd£, -toft3er>T?atai**ifc«as* 

(c a a) firE8llt53&rofll3tH14£3«rrS=J-K 

(c a b) itrE«s»*©^v->a.ass*sr*3- 

(c a c) firEVRDOtttitagl-fclt-S. ifTEftfiS 

liJE^Q-y^AliMK. 
(cb) -troJS^T-SISi)**t 



&a-KI*. 

(c b a) f5E*S<8J£gi&rom:*:gtt£3l3rf £=3- 

(ebb) i5EX»«jtiE*<o/vf>isasas-r* 
(cbc) ta&xmom'&mmi-tniz. mtzxmm&. 

Ba«>WE/\ -> a. SI i fitXIil* tttt-f * a - K 

fed*. 

(cc) |trEXS<DltTE«|jilSmi-fcl+S|frE«-XS18 
«*©»lX«tt;*tf/\?»t'a.a«*. HJEVRD©19 

E«jtaai=fc«+-5»j6«xstta^\f>a.asiJt 

Kit * a - K S*t? C i: **« fc * S^SSS. 
[R5S85 1] VRDlrfBC,LT-7— ^T^^SffiStS 

tiiEVRDicteitsmi ^"T^os^aK *x h$*tfc 

«Xaai=-3i^-Cfl!>« i t£3f a. S3i£j*S-t£x 
vrd'JX H=«rESi ->ia8S»«t4X 

«Bl2lt*\7i'aMl»yilBVRD'J^H:«t 
ft&Ci fe#«i:-r 

lli*«5 2] fliESn $ <f ^rosirEHtxgigi*. 

ltrE«JSS«©S8. 

tirEaJi£3ta>ag. 
ItrE^iilsmo^-y^. 

■t&mntm 5 2 izEimro^s^a. 

[ft3*3S5 4] |ijEVRD0>i3<lESt,ai<^X h*4lfc 

axsstii. UtiEvRDroj£iae*iai3l. sfciilftE v r d 
fcaxsa-ea&y. 

ssii*. fjE^-^T^^m^xsoffiiawfltJii. a 
*i*M&*iza4ia<*-x t-2titzm*mmx-&&zkz 

cos si* $ as-r -s asgs-e fc o r . 
lirEvRD©mi 9 ■< Profit is < ^-x h$*tfc«s:s 
t\rm i i£?S/N a.safe3i^-r t . 
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St, 

itiiET-^T-v^sgxsicfci+smi frosts? 

$ *r-r s z t t -t ««tstts. 

OS: a 14 * ss-r «> *«* =i > e n - $ icgt^y * 1* & 3 

VRDUX KDliTEmi S?i/\f>aiS$S»it§3 
gi££fe;rf £>=i- Kir, 

fiJES2t£5i/\-v->ia^*<fifEVRD 'JX Klz#«-r 
&Ji-&lc s itfrE^-^T-y^igXgfiSMiT-fc^.kg 

iiir o $ i tt§s/\ -> mm & *s-r -s =• - k t . 

itrE^-^Tv^mssisicfcit^itrEmi *f ?o>m 
tas< *x h**i.fcss:iimirotN-cm2ffi?i/\->->i 

f!FEm2ffi5g/\->->^SS*<«5EVRD'JX KIC^ft-T 

£ £ to C £ £ 1#® <fc -T 5 =i > tf 3. - $ ? P * 5 A a a . 
[fB*3S5 8] ItrEa— Kl*. 3ia^£#toci:$1#® 

t-r i>ii*« 7 icE©o««f^. 

[0001] 
[O 0 0 2] 



-Sfctorosaa^ a > if ^ iz^ft f=»i=f^fiE * 

jh.fc=i>t?zL-$ ^P^AlcHU Ml-. C<D=J>tT 
a. - $ ? □ y 7 A*<E® * titz 3 > e j. — S> $ 

^ to => > tr a. - $ 3f □ ^ ^ as a °p i- H-r -5 . 

[0 0 0 3] 

[ftffiWWS] «X»«r (/<-v>7) tit. XS*& 

<o«i3ci=;^-r -5S«is<o^ x <? *<4>fe: < <t t^**v 
-So *-lt. -fiftwi^ -toxsarvy-^jrecoE 

ifi. XI*. -CO KCDliaWSOJ&Aty (logical chai 

*<yi-s-5t^=<titssi*. -fiswiCs xmfrt>xx.-£ 

T% XS£M§l=aXS*fr-i)^i:l=J:oT*fi£*^ 
S. 

[0 0 0 4] VJ—Mto't—y-lt* CUxlS. XMLXS 

h^^;u (dom) ffcssy;i — ^i*. &3gorfiS 
*7?:t*r (xml) xmizm^zmmv)- 

mry^'r—^B^ya^^^^^—yjL—x Cap 

i) &m&.ffl&Lx^&. te^. -r 'O y-i* 

S-(~it?g$B&-f So co«^li*S6«)lcl*p-;u/s*-y 

<S«f*<J&S'5:7^'J'y— >a>l±, file's: -T'O K^fli 
S-T-Sfc*)l3M> K-9 ^Slfr-T S. -*tl±^5 7 

^i=«t<fti-ctNS. 

[0 0 0 5] S<0/<— 9-1*. JSeHOTT'^'J'tr— 

^l^limet'SS. *t>l^ 7 >a>li. -t-O 
7 > a S (0 'J -«3gO«^ i: -r -5 

•«*.5fci67£ltl=. ei^-rSOTItlfSS^T-feS. -f'Oh 

y-i^^ttii. xML:stsi::*}u «fcymset?s 
jtyeuu^tffl77 -bxAtDrigt^y. ffifflweg'S 

*#ai-t"S. "Simple API forXM 
L" (S AX/<- y-ifSi-T) I*. XMLXtSIit* 
t-ttiO)^^^ hSSSS'f — 3i- X-CfeS. SAX 
/<— 9-1*. H2A. B2B. BI3A. g3B. @3CI= 

[0 0 0 6] 01 A, 01 Bl*. /<— y-i^X^AOT^D 
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§SP#1 0 6|COtNT#§-T^o 

105 <:Shakespeare>: 
110 < : ! — Th is i s a comment — > : 
115 <;div class=*preface"Name1=~va 
120 <:mult I i st=& : I t : > : < : /mu I t> : 
125 <:banquo>; 
130 Say 
135 <;quote>: 
1 40 goodn i ght< ; /quote> ; , 
1 45 Ham I et. < ; /banquo> : 
1 50 < ; Ham I et> : < : quote> : Goodn ight, H 
155 <;/Shakespeare>; 
[00 08] Hi B(Cfcl>T. XML^ttl 0 6lt Z 

<DWxit. -f^>hia)^-tt*fei>, /<—■ tM i 2^ 

m (DTD) XliXMLX4-7^f:, tM 1 2 

^y^B^t LtA^^^o /<— 9-1 1 21*. £g] 

0 6a>fis»»ft«3ia«sai*r*. hi Aicfet^r. 

ax^— Kx*>nu*>— h (css) xi*ffi5ix$'f/u 

i/— h (XSL) 10 4lt CSSXIiXSL/^- tM 

1 DTD1 0 2 t»*fcw(D/*— 9-1 1 
0^A2)£*x£o *BI-fcl*T* 1 2S. 

t;css/xsL/<-ti i oi*#iz % -fK>hi»i 

[000 9] XML(D«):567~^7 7^fgOTJ^(?) 

f&l*. XML0)E&JL<&1£iffr£>£Tl^o XML^f 
liit^vx<7 hltM&T-*XSV-f X^tl^o fit, 

\zt£Z>^ktf&< fey. ig^lc£oTI*F^£fc<&o W 
XML^^'J >$0cJ:?|Z. /\— K^xTftfl 

^fli^$*ti>ig^ici^ xrpi(7>xML^:sogi^^-r* 

li. -ttWlC. *y«j»lcl*fc&ftl*. £*>lz* XML 

<&Etzti+ pt&xz&wzz common fi&tffr 
tt & a*«>/ <— a t (* * * p 

[0 0 10] 



[0 0 0 7] 



ue1 ~name2=~va I ue2"> : 



[1] 



et. < : /quote> : < : /Ham I et> : 

[ooi i] *^0Si (D««-r?i*. 

*Sft!ti«iXf*^i:, mrE*^:7*<firEfgl * 

[0 0 1 2] #*M<©«6a>««t?l*. VRDl:ibLT 
ot, (a) 7-«77?»R»S, *Z7?«WH**l 

fiS*»gt-5Xf7^i:, (i l)»EK*a/>h 
«$aeT4Xfy?i:. (i i OUEKti^h 

*y3&««r<D K*a/> h^^fefcy tx<** mcs*l4 
(b) vrd*. *zxmi\ztiz>*yz£iz. mz? 

( i i ) »fcr***BMH=a3it* 

«rei«»si*xi=. (c) hek*i> 
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[0 0 1 3] ^^0>^Oma$X'tt. VRDlcB§t>LT 
o"C, (a) TOE V RDf, tOKfrV&iVi 24\tz®m 

sortie, sas-r^sas^xv^^^. g&sxT- 

vzft. ( i i ) m®.%tikmma>'\~jis*mmtt'£-r 
SAj^i. ( i i i ) TOEVRDosigggiirasi-j- 
iK i98H«SB*ronri3/vy>iasa^«3c«i4*tt 

M-r&Z.Tv7i:. TOE82:Si£l*Jei=. 

Siaixf-^ii, ( i ) TOE*:#«£gsiSfl>8l 
xai4satt5xf»^t, ( i i ) istaxs«itM 

$fl)/\7yaiS^t*Xf7^. (i i i ) TO 
E:*§CDl$|&£Sil-fc !+•£>. TOES:#8}>tSiitCDTOE/\ 

TOE&fi^&iiMi^ (c) m&x&otisnmm&miz 

*<. TOEVRDCOTOEffiJt^SII-fclt^ttJSffiXBttX. 

[ooi4] ^wo<teroSl«-c-tt. ixsis^fcv 
— > t zfmmxm & m^itt s iwfcara-e & o r . 
TOEfcSigiiiro— sk-^t, gtXMta^-f ?£ttM 
tsxf-^t. ( i ) TOES-r^tffn $-r z^efcti 

li, ?UfB«^:^ro/\^v3.S^^9lS-r5ci:. (i 

i ) toe* -r i mzfvtu+tiit. mtimxmm 

©EffiaS^StSCt. (i i i) TOEftXgSi* 

[0 0 15) *«wrofifea>S§«T-ii. S^ikfltxs^^ 

XW$k(»*-< ^StB&J-f&X^ ( i ) TOE*-T 

vaSS^StSCt. ( i i ) TOE* -T 1 * 

■T-SCt, (l l l)«EI»ftiM*S(M»t4 = 
fc. ro'>ft<tt-^ TOE#*Hb«:StggS £48131*- 

[ooi6] *H8aHt!i<D8S«TM*. «:*gSl£^t?^ 
— ?7-y:7Si§:£8£Stf-rS#?*rg@-efc-3-t\ TOE 

»XK*©*-f TOES-f:?** 

Il$-f^1?*li«^ TOEtSSSSilfcs -5-ro/\-v-> 
TOE$"f W1 *-r:?-C'fc&ig£KTOE/v;/->J.gSi 



£fl!^-C. TOEX#ro*fc< <ttSS#tt*8j£gSi£ii:*: 
t*S4i.-5. 

[0 0 1 7] #S£Bj3aHfe<BSSt*-C'li. V R D l~BS & LT 
oT. (a) 977^fgXSS. -£CT-t£»J;*:h. 

L. ^^Sl±. ( i ) TOEK*i> >S**-<0PgjHiiM 
£3i£-rS3Ua±:. ( i i ) TOEK*oLjt> h**-pgjf 
l= felt* TO© K*a>>h$^roA7->aSSI-iei 

TOEK+^>>h^^a>ffi?S/\-y>3.SS$3tS 
•T^^St. (i i i ) TOE h$-?A<TOa> K 

*zl*> h^y«ty £3E<*X M=S*i*Jl*l=. TOE 

> > h * ^OTOEi£5i/\^ viS^^ftiW-r 
Si:. S^*. TOE&SgBliMI-. (b) VRDf. 

-e-c-eit»j£*t5 sortie, m y Pis 

I*. ( i ) TOE*?<D|lIlfteS£3iSrf S3M3£. (i 
i) «t5-r-5f ypg^tfclt-STOrof ^ro/^yaSa 
l-2liEi£tti>TOE£ y <Dt£S§^*;> vaiSSSft^f 
St. (i i i ) TOE$-9"CDTOEffi3I/vv->i£gi£ 'J 
X M-te«i»rf£#f3i:. TOEfcSgfittSI::. 
(c) TOEK^ijt>K^^roTOEt£?I/N-v->3.SS 
TOE'JX Kfll-fc-i>*\ etlis fJia'JXh(0^>/< 
CD££ ft* :?-&•;> hra&S*0>i:*>e>^-e&5C<!:£:£ 
IE L. **t(::<fc-3-C. TOE^— >7-;/?sf§3tS0>SiS 

[0 0 1 8] *S6W(Dte(Dfi§*§T-l;J:. VRDICibLT 

it. (a) TOE V R D £ % ^(0JSA^T-^SiJ**tfc«5t 
tlz, «ia-r*^S$^. ( i ) TO 

sim^momxmm^-r^&t. (iotoe 

aJtS^W/N-yi/i^^a^-rS^-Si:. (i i i) 
TOEvRD<0«JtSSI=fclt5. TOE«jg^m©TOE/N 

•v>ig^ai;as:si4*»iw-rs^ei:. to 

Efc2SSI*MI^ (b) 7-^7-^fSXtS. 

SlXSttSStS-r^^Si:. ( i i ) TOEX««iggS 
<D/\-y->:iggi£JfeS-r (i i i ) TOEXS 

S^taXM14*«tt-rS?-®t. TOESSSg 
SliMI-. (c) TOEXSro TO ESit I- ft if •& TOE 
€-:*««i£SS<a«Xgte&tf/v:'->3.gSt<t. TOEV 
R D(0TOE«ita^l=*5l+'S)»tt;«S:Stta^/N ? vj. 
SSt^ibSL. ^*tl=J:y. UE7-^7 7^m 

*cos stt $gi2-r * #s $ ^-r * c t $ t r s e 
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[ooi9] *mwomo>&&x-it. m-scw&zst;-* 

mo>f-<7&&m-r&^&£. ( i ) we* -r i 

( i i ) WE*-r WSi *-f ^-cfci+Jhtf. 
frlB«tXS^C0ff^SS*3lS-r-&Ci:. (i i i)W 
E«l»*tflM*-r*wi:. <0L>-rHAM=J:or. WE 

[0020] *«wo)te<D6S«-e(i. ?HHM$t:fc^ffi£ 

r. WE#^b#:£gf$t©*^£its»]-ra3M5i:. 
( i) ftJiESf^tfmi *f ^-efc*T.tf. WE#-fHb« 
^t^0>3?/\-v>iSSS9lS-r-5Ci:. ( i i ) WE 
S^zWmi S-f :?T?fci**uf. WEtf-sHb&XgStro 
fg&SSEfcSiS-tS-i:, (i i i) fc-SHbtSXgfPiS 

&m?&zt. <j>&t£< tt— Dict-D-c. WE#-«Hb 
[002 1] ^SEECDtecDSHiTMi, 3>ea-$i:« 

mro^-r^£ns'j-r-s=3-K<t. we* -r i * -f 

^-efc-SJg^l-. WEfllXSSi*. tro/\7->aS8$ 

*s-r-5ctic<fc-3T. sis-f &=j— K«t. WE**? 

[0 0 2 2] *£!3a>tt<0tg : tiT'IS. 3>tfa- 

r Dies p> ltv— ? t ymmsnmaismm^-r 

•5 ^« * H fir * "fr -5 =i > tf - $ zf □ ** ^ A -e fc o T . 
(a) 7— ^Tv^mnxss. -C"w^»J**tS 

•7— 577 :?Sf§co k*il> > h* ^PSJli-fcitsm 1 
> hs-y-efcitttii. *&3-rs=i-K£*ru 
&=>-Ki*. ( i ) 15isk^zl>> h5»^a>^®<as* 

3l£-rS=>- Kfc. ( i ■ ) WEK*i* > hS^fUJf 
icfcif-SWco K*i>> (-^^©'\7->itSl:ae* 

-T*=i-Ki:. (i i i ) WE 3. ***<W CD 

12 K^zLjt > h**'cDWEt£$g.'\-y v'iSSISteitfrt-S. 

(b) VRD?> *ZT?ffiJH*:h.S*? , ;:il=, m 

K^-Ki*. (•) wE*?"cDPgjnaa 



■swcd$ v<r>iw ->a.Sisi=ae**i.*iirE^ ycosss 

A'r>a8SSaft53-Kt. (i i i)WE**" 
CDWEffi36/v;'->i£SI£ , JX M-tefcirf &=i- Kt. 
•ZSH. WE=J>tf J.— S^n^AlilEI-. (c) fir 
EK+a^> hS-^CDWEffiSI/s-vvig^**. WE 'J 
X hfll-fcSAv s£l*. WE'JX hCD.>l>/*CDSi!i^E-y- 
:?-fe•;/ hT-fc€>ii£l::l*. WEv— ?7'yZfgm$i&0> 

smiis * sis-r * =i - k s^t z t z &m t -r *> =i > tr 

[OO 2 3] *SEBJCOteC0SI^-C'l*. 3>ei- 51 ICV 

r d izm t. l r ? t ? zrgmx&<7>&£&z&&T 
^ei * sifir * -a- -5 =1 > e 3. - ^ ^ □ ^ v At? fe o x , 
(a) WEV RDf. *ots;fr-(i3&®\ztitzm&&mz' 

«i31-r*a— K**». B3- ( i ) HE 
«3£^SgCOft:*Jltt£3t5rrS=i- Kfc, ( i i ) ItE 
^itH^CO/N-vviSS^aiS-r^P-K,!:. (ii 

i) ifrEVRDco^jt^icfcit-S. ltrE1«jt^mcoiii 

l?TE^a>f^AliMI=. (b) "7— i7 7-^^B^3t 

ic. «&a-T«.=i- K$^, ^=1- Kl*. ( i ) SiTEX 
S1SigSmco«S:Stt^ai^-r*3-Ki:, ( i i ) 15 

( i i i ) IflEXSro^titSSI-fcltS. ftFEX^it 
^cO|trE/N-y->3.aSi:«^Mtt^ft«rt-ri>=i- K 

lUE^n-y^AliMt^ (c) mFES:Sco 
|tTE^5tS^lcfclf«)flJlE*XS«jg^cr>aS:Stta 
lUEvRDroitrE^JISSicjsit* 

c i: &&mt -r-5 3 >ei-ar *f^j*&m<&zti 

•5. 

[0 0 2 4] *fE?B<OteCO!ilfi|-Ctt. 3>tfa- 9fzWl 

am^Z&mir&zi—Pt. ( i ) ifrE^-r^mi 
mzfvts>ti\£. HlE^S:S^CO/\-y->a.aSSaS-r 
s:t. ( i ) 1MB* -f^1 $-frft?^EIt*t«. 

®f&mx&m0E.&m&£&&-fzzt. (. i o itr 

[o o 2 5] ^*wcoteco8l4i-ei*. ^vifi-^i-s 

SSHfiri-t-S^i/ei-^^a-y^A-ffcoT. WE 
»^b«IS:Smco^'f ^$!5SiJ-r*=i-Kt, (i)W 

e* -r i ^ -f ^•ps.+tis. wES-^-^axsmco 

i£/N-y vn.SS$34S^3)Ct, (i OiB-fW 

gn ^-e?sit^is. WE»#ibaxsscosa6as 
sast^ct. ( i i i ) stftixitss^t * 
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zt. <*>'>£< fct— ox-. TOE»*§4t«*:ga£«&a 
ti3-Ki. £^t?ct£$f&£-rs=i>tf3.-$;? 

[0 0 2 6] *fSE(Dtea>8S*$-eii. =j>fcf3.-$l::«| 

fr^i±-5>=i>e^L-$ ^P-y^A^. ft«ALfc=i>t:3 
— 9 RTH«<**^t? =i > tf i— * ^ ASSt fc o 
■c, TOE=j>tf3.— s^p^ai*. toe^x^!©* 

•Sli^lc. TOE«*S3S£. -f-0»\-yv3.gg|£3t5?-r 

4ctic«fc-oT. «ia-r.S3-Kt. TOES-f^Sfn 

^-f ^T-&-S.Ji-&l-MfH/\y->3.SS^ffl^T, TOEX 

[0 0 2 7] #3SB.BCDte<Bj§*§T'li. J>t!a-$|:V 

**«*afT*-tt*a>ea.-*:7nsr9A*. te&L 
fc a > fcf 3. — S< pT0S &{* * ^fc 3 V tf a.— $ ^ □ ^ ^ A 
igSa-Cfco-C, 1MB=i>tf3.— *?ny?AI*. (a) 

h^-yrti^, ^k*3.> *jjs-ts^— 5 

Kii, ( i ) TOEK*3.y>h$ 7<DPI/Hia£9«£-r 
^=i— K«h. ( i i ) TOEK^jOhS^pgJfdfclt 
-STO© h>a> > h $ ^ffl/W vaSSCie* iKfc. 
1515 K * 3. y > h 9 -Jf <Dt£3f / % -v> 3. SSi £3urr -S 3 
-Kfc. ( i • i ) TOE K*3.> > h*y*<TOa> K+3. 

j<> K^^<ty *,se<*x hi=**tsig^ic. toek* 

3.^ > h$?(DTOEffi3§/\-;/i'3.gSt£t&Sft-t-5=i — K 
<t. TOEai/fcTi — Sf^P-y^AliHI-. 

(b) vrdS, ^zxmMsixh^<fzlt\z. 

ypgn com i aiar-sn— 

K**U K^-Kli, ( i ) TOE* *"<&PgJHaM£at 
£n- K<t* ( i i ) ttJE-f •SSypilf icfclf&fij 
055» ->3.S^I=a®3?*vSI51E^ ya>S£5g/s-v 

•>aIS^t*3-Kt. ( i i i ) TOE* **<DTO 

itlEp>e3.-*zfn^5AttMI^ (c) TOEK 

*3.> > h*-yroitrEl£5S/\-y->3.g^. toemxi- 

•y K-Cfc^li^l-, 1MB-*—' * T^^mSXSroSStt 

* & 2 -r * p - K £ ^ t; ct £ 4* a £ t $ p > e 3. - * 

^□y7AS?a*<^**ti). 

[0 0 2 8] *fSWOTttOT8Sl§-C'li. 3>ea- 9I=V 

*^S!$iiff*-e-S3>e3.— s^p-^a*. ttttu 

fcp > fcf a— St rIB&JS(*£ ^t; p > ea-$ 7a V =5 A 
aA-Cfc-9-C. ttG?a??AI*. (a)TOEVRD 



- K«#*. M=>- K(*. ( i ) TOEtHitgfiicDftXJl 
t££*;rt£>a>- Kfc. ( i i ) TOESici^SiOT/wi' 

3.gst£3t£-r s=i- Kt. (i i o uiEvRDffli 

JSSSIcfclt*» TOE«itSHIffl)HrE/\>»i/a.asatf 
*£Btt«»*W-*=i-K4:. TOE^p^A 

i±mi=. (b) 7-— ^r-v^miiAS*. 

Ka- k», c t ) m&x&m&wssomxm 

tt*»S-r-5a— Kt, ( i i ) WEXS«SS*<»/N 
•v>3.aS$*^-rS3-Ki:, (Mi) BiilSXSffl 

«it8ai=«it«. nrEX»flua»*«>»E/\vs/a.a 

liiXItt*»l(lt4a-Kt. S^*, TOE^P-if 

^Ai*mi=. (c) itrE3t»a>WrE«sa3Si=fcit4«r 
E*x»«Jss*<o«txattai;/N^*>3.asA<. toe 
v r d roTOEetitsslicifsit /w *> 

a.aai=-ar***i=. toe^-^t-v^w^xsco 

3.-- s« ^p^7Aiga*<s«**i.-&. 
[0029] *SiS<ottroS§«r*ii, ^xs^^-^fc^ 
— y-rvynm2i&o>'j>t£< t*,»»ttftaaaa"c* 

ot, TOE^3t^mroW©-S^I=3?L. TOE-Sa<0 
*-f^SWIt*iJlXf9?t. TOE*-<^*<mi * 
•<^-efc*ili» -t(D/\-vv3.SS$3lS-r-i)-tlz«feo 

x*e>wm : £tiim-f&9ikm*T'"j7t* toe^-t^to 
e»i ^-efctiii, iffi'\7->ias^ffl^T. to 
ex»©*<k fc*»»tt*aaa«ta****a* 

[00 30] *aB8©ftfe©ffl«-CI*. «3tS9l$t&7 

(i)TOEXSt. ( I i ) TOEX»£S*f 
"f S¥-gl*TOE^'P-tr •y-9-|=Hfi$1i-S ^p ^ v AS» 
SALyi^P^^Ai:, ftttti^^Um. ^^"P-y 
5Ali. ( i ) llGiXf XO» -f ^M«t4a - K 
t. ( i i ) TOESi-f^tfSn *-r^T*fc-5>li^l^. TO 
EiXS*S. -ta>/N-y->3.a^**S-rSc:i:l-«to 
T. Kt. (i I i)IKEJ-fWl1* 

-f ^••Cfc*Jg^l-TOE/N-f->3.a^$ffltvT. TOEXS 

a>4Ht< t*a»««aaa*ia*r*=i-Kt, 

[0031] *fgw<Dte©SI«-ei±. «S:aa*^t?v 

r d tcgge, L-c«jts*$tt7-9 7 ^ zfnm-x&o 

SStt^8lg-ri.eS2SSt?fc^-C. (a) ^P-b-y-9- 
(b) ( i ) TOEXat. ( i i ) TOEV R Dir. 
(i i i) X«ro££1±£S&S-f £3MI!i£TOE:?P-fe^ 

L, (c) TOE^Py^Ali. (ca)7-^7??t 
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iiti\£. ffllt53-KS^L> &=i-Kli, (ca 
a) firKK*i>>h$?0>PllltaS£*S-f3=i-K 

t. (c a b) itriBK^i^v ypisicfcit-sitrro 

(c a c) 1512 K*i> > h £ ytm<r> K*a* > 
K^yj:y H=**l*lg^l^ fl*EK*J.> 

> h * ^OTHtriEISSI/ n v atSSftSHt -5 3 - K t . 
UtTE^P^AIiMl:. (c b) VRDf. 

I*, (c b a) S1JfB9yroPglHa§£*5£-f £=i-K 

(c b b) tuz-r$>$<ymm\zisifi>iii!0)$<fo>/\ 
v -> 3. ssii-ass * -k s we * ^coffi5§/ w> zl as * 

at^-TSa-Ki:, ( c b c ) tirE* 2"<DtilEftSI/\-;> 

□ ^vAliMI-. (c c) HirE K*i>> K^^flJME 
lirE'JX hrtl-fc-SAv fir 

IS:St§<DS*ai±£«2-f S 3 - K c <t * -f 

[0 0 3 2] *f£WOlftrofE§«T'l*. ®:StgiS§£^fcV 

S^ttS^S-r^^SSM-efc-aT. (a) ztn-by-y- 
t. (b) ( i ) ttEX»t. (i DUEVRDi, 

(i i i) xsay^mm^mi-r^mimmzfa-izy 

£*TU (c) ^E^P^AIi. ( c a ) HiTEV R D 

-K***. S=i-KI±, (c a a) SirEHJilgSStOS 
XBtt^SfeSTf &P-K<k, (cab) f!rE«iiglfia> 
v^SS^SiSt- S=J— Kfc. (cac)lirEVR 

Droffiitasicfcit*. itiEfi|jtll^fl>KrE/\-yvia 

y^AliHI-. (cb) "7— ^r^mn^s^ 

Sa-K$f*. (cba) fiJEfcS&i® 

^ro«tXSt4**S-r5=i-Ki:. (ebb) I5EX 
SmigS*a>/w>3.gS^9tS-r-5=i-Ki:. (cb 
c) l5EX«ro«JtS3SI=filt-&. ftTEXMJMHRO 
?JrE/\-vviSSt1i3tS14*«tt-rS=i-Ki:. £^ 
11rE?a??Xil*SI=. (cc) firE*:«a>iiIE«t 
jgS^I=fclti>l5E*3t««3&^OlS3tMtta^/\-> 
viSS*. UEVRD0ltrE<8JtSSI-fcltS«*E« 
XBS&tf'N 3.^51* it® U ^-^1-cfcyitlEv- 
? r ^miixsrosstt * «s-r -5 p - K *ftt? c i: 

[OO 3 3] *%W<0tea>8S«T-l*. VRDCIbLt 



it. itrEvRDicjsitsmi •5'-r^ , fl)St,sE< *x h 

tSXfv^t. vrd 'JX McHES! St3fi/Nf>i 

•c flt2tfcS'\?i' iss£»£-r sxt-^t. itrs 

m2i£?g/N-y->a.a^1irEVRD'JX HcSfctSii 
15E7-*7-;/:?Bi§:fc«l*£aT'fc^::,h£S 

[0 0 3 4] *fSI3<DteO>g|^|-Clt. VRDdggt) LT 

VRD UX KDWESl s£3i/\-v->j.as$ft«ft-r 

y\-y->3.SS^94^-ri>¥St. !irE!f5 2ffi$g/v;/->i 

[0035] #fg93a>tea>S§t§-ei*» vrd bit 

—5> * -S 3 > e i zf a V =7 A-e 35 •= 
SirEVRDogn ^-f^ostaK^x h*4xfcl8S:g 

VRD'M f-rottrEBi t£3§/\'y>iSS^ft«ft-r-5=i 

•^->j.SiS**S-r*=i-K«t. f5Em2l£5I/N-yi/i 
SSMfTEVRD 'JX Kirs«-r«.«^-i~. SfTE?-^ 

[0 0 3 6] *^^0>fl6a>SI<ST?li. VRD IZM b Lt 

fc-3t. k^p^tAi*. tUEvRDro^i 5-r^rog 

SS*3lS-r*P-K«t. VRD'JX h(-ltlEmi t£H 
/v'/viaS^tttSa-Ft. ItrE^-^T-v^S 
^XSlzfcl+^tlTESl J"f^fitS<*X h**vfc 
eiXH^ I- -P l*T m 2 1£56/ n ^ SI S9iS-r 5 a - 
Kt. lirE^2S£5I/\-yi/iSiSA*|iJEVRD'JX hi- 

a.-5 ^P^5ASSA<ffi«*4x4. 
[0 O 3 7] 
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[0 O 3 8] *W®&lzmm<Df£W<D®L'&\*. XML^ 
^O^T. WtSfc6l*XML:7 7-<;U4 3 0M&0> 

oT. X M L/<-1f0)^ 'JMIS^>$^ 

*Sc*fl]i** MAUL W3C^Mihfc»*«o 
£a>S£<fcU *&<fcl**il*fc6£l\> **il=ftiLT. 

(hLtg^t^t^c fit, ^^yrtfwgfc^OJ^ 

•c<«frr*ci:*«-c#*. /<— y-i*. a*» 
[0039] zzv&m-tz&mommit. nm^m* 

^£*l£gMI*. XMLfg^tiLT, KW-TiCi: 
t+Zo l^L, Cttl*36B«<Da«»KHt«9IHLJ:3 
fc-r4ta>-ei4<Pl^. MiLtf* -ClcSB«**tfc««(* 

(UTF-160PSBI*. PfSStftlSO/lEC 10646-1 £#180 

[0040] H2A&tfB2B(*. ft*ft^(DSAX/< 
— tr&^2 3 6 £jFL*:t0>-efc&o C*Ll*. |»S % & 

[004 1 ] B2AlZfclxT. 7-^7 7«S (*S8 
J6»»lzfeL^TI4. XMLK*a^K) I*. fcJ&Xf 
7^2 0 0X~mfrtl&o *0'&* *g^f7?2 0 2 
I*. (-rttfe*flffi£*lTltttl*) 

(D&mz-S^tzte*. IUT0)Xf 7^2 0 4T?X^$SI 

0 2i=sivc. Hicxw«ajs*ittft*itf. mvmm 

2 3 6l*Xf 7^2 3 4t*87tl>e 



[0042] X J r»>^2 0 4lCg|t^r. fX hXf7 ^ 
3 6l* w W|Xf^^2 1 O^&tlo 

vmm2 3 6i*x»*i=ffiffl-e#«x*A<Mi=s«'r* 
^«flRt«fl>f?p 2 o 8^5. 
a>ffiffl3b«RTte-cft*ttf. A-«i2 3 6ir i*l* b * 

*<ttffl-e#ftt^«*i4. «B32 3 6i4" lm^x" *sji= 

^MfflX^r *^2 1 O^fSj^do 

[0043] ^arxfy?2 i oi4. a*4t 

* 3 -teoxstfliaflo^^yaai*. - 

asr-fcy. JWft'jxhtftijw. 

[O 0 4 4] Xf7?2 1 2 0>». 3 6 I4gff2tt 

fx7^$I^^>i:5^S^t§fX KXf7 
^2 4 2-^[Rl^5o *»tt*-x**l*. 
— h±0) http://www.w3.org/tr/2000/rec-XML-200 
01006.html t-a#t?£& Cffi^Rl^^^ — <77vZfm 
m (XML) 1.0 (»2B) W3Cttft* 2000*1 Ofl 6 

14. XftPHttftttfliaju— ;ui=mLTt^«^Jf5» 

*fT**L*£* »32 3 6tt" YES" ftBPlzJClST, 
X^" a" 36*6" d" |::»-3fcjS«2 4 6 14. @2B<7> 

»j6-r*«JMB(cift**iTfey. ti-c»i2 3 6*<i 

bl^^t^^ £»te^x^**<8SfT*ftft*ofc 
ffl3l2 3 6l4" ftWlciCC-C. fX hX 

t- 7^2 4 2**6" fiStt^xv*" «fr*'<**£ 
33^*flR"r*^ X hXf7?2 4 4^|rJ^5o S^fe 

nriBW3ca#5. ia5ic^^^^>^:#^ 

(DTD) <Dcfc5fc. S^14#^^tS (S#CDfcA 
VRDW) USBdtl«fiattO)«lftfCiH&LT. 
X#*0)«l*K*Oit«*ff5. DTDtXML^- 
•7l4S^tt^x^^A^fr$*L^ VRD0«t*4^ 

14^x-y^(CltL. «fcyj£«5HI=:8^-C\ SfROiELl* 
[0045] S^tt^x^^3b<^^H^)t. «&12 3 

6i4 w yes" ^^icjsct. fie-e^**ife»*e2 
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2*tfcfrofc:«^ ««2 3 6I4" NO" ZzWlZfcC 

8«m*xv StLfc*^ 46312 3 6 i***a 

2 4 6±0" a" ^bt^>a>^l2 3 8, ^(C 
-fcQ^fcMBtt^x v*X*rv:7 2 1 A^fafro* fllJl 

2 3 8<a* :?i>a LTOftgli. ^ST?H*4ltcS 

[0 0 4 6] S^tt^x v*A<g#£:h^i§^ M2 

3 6l±«jMS2 4 6 JK7T b" A^6*^i/3 >-9-^ffia 
240 H «lz«t(D*lcft&*t«Katt^x^^^-rry^ 

2 2 0-vfS)7bN5o &S2 4 0<Dg«(ttfcteffli£@T:K 

*tufc**»-e***t«. insist x a? 

l**^* 5&312 3 6ltttJ|Ltt2 4 6 JiCD" e"*&7* 
i/a>»RXf*;^2 26^5, ilttfi'V^^ 

8^*s*t*«k3i=:«jE»«ai;/*fctt«y»«*«fT 

3 6liX J 7 L *>^2 1 6^bit^l2 4 O^ft^lV 
C-ei**att^-x^^A<X-r^3f 2 2 OtrH^T^^i^o 

«r»a>«fc3iw % SMi2 3 6it Bijfi7>xf*; 

^2 1 6^b, t > L<l48»ttfi-^Wftl2 3 8 

IV J&S2 3 6I**JM»2 4 6 ±<D " b " ^bS^ttf 
i7>^f*;?2 2 0lzHSfSj^5o t^vaXOli^ 
tt-9-^«iS2 3 81*. HfflftttRtf-rX hXf7^2 4 
2, 2 4 4-CT*ftfc**I^W/<Xt4- ttftss 
(|2A» o 

[0047] 1iJ&CDcfc5l^ SSSfi'^Xf7^2 

2 0lt X»S!£B (dtd) |-#LT%l££;h,&^- 

(DJ£«*fi3il** 1*:7&3I2 3 8l^£*iTl^mfc:&g 
»tt^xy^*y 4iS«H(c*-3T. KXttjELlMt*: 

[0 0 4 8] S^lif x7^Xf7^220 iCjS^t, 
SUfi7^Xf^2 2 2-CBy3b<aUJ**L*^. * 
IEttft*«38fr**U ^2 2 4^^**i4cfc5I=ttl^/ 
l^ABy»*tf«T**i*. By#«a**ttt 

LMf^l*. /<— WI2 3 6l*7*->3 >Stt*&3l2 2 

I^t7^V3>^a«*tl^o @2A0)Xf7^2 
42M2 4 40)»SlZfc^Ta«ft*IR^T**t* 
I«Mf^l2 3 8&tf2 4 0(DI^$/W/U 

£<h* fif&0>J:3l::/<— X«ft3l2 3 6littJMB2 46± 
[0 0 4 9] RXSS^^ftofcS^ ^228 

a*«sfT**vct^*T^yy— *>a>(33i&*L. **<*> 



*ai*Bl***t*. ftltttl::* /<-X«3i2 3 6 14. 

JMUbO u d n ^ftfrlV H2 A0^f5t^i^* 
tlt:M82 46±(0 41 d" ^?»»fX hXf 7^2 
0 2^fp)^5o 

[ooso] xML3t»a>3cattiZcfcy. #<a>*^y 
<ticfc& 0 c©*«*»I4Xt-v:72 i 2izfcivc#f» 

^JLV<7^7-V^2 1 4atf8aSfiy*Xf^?2 
2 OCDM^T'SlC&o 

[005 1 ] ^(Z^-r2 00X^^^CD^(-|I^L 

-«tti=. 3t»<D»»^^y««i*««*3h.<nt*t 

X*5!I0>^x^^f4|gfT*4x<Plt*Ltf«:&« 
IV o*y. (i) B»$m*<HCTl***£3**(D* 

fc. »tta>»ttA«tti\A^if53b^a>^x^^(coi^T(Dx 

^**2 14fc, ( i i ) DTDIZft-f £*gjg<D-Stte 

<?&o>T xv* icfin^x. ( i ) <t H«a>ffla*<iB 

^<hfc&Xir~/^2 2 0<t. T'fe^p /<— 

x ML«£*<S?8r >^yti:xML*»fl) 
-ffl#$«#Lfcl**itfft&«i:iV SAX/<-»(D»4 

y+lc»tt*n<H+4xtftt&a:iV L**U * 

XML* *>:x§l»£ ffll^ti-- * fcS^IZ^ 

[0052] ftatt/w*>iT;u=fyXAi4 % ( i ) m 
@:Efi#*§-<t77U=ryXA (crc) (a«k ir-Zfr 
a-VMftMlcfil\T« »*»«rXI*Ky«ffl/ITjElc«l^ 
£*i&) (i i) ^iPXUX^bT/l/i'JXA 
<h. (i i i) *77>8^fc7;^'JXAi:, 

[0 0 5 3] -flgMlZ. @^/\7Va7;Ui'JXA 
I*. »»«i:fllW-e<tlt*ttftt6<HV Bl*»*-*U*. K 

*u b**^^— *-fev n**a>a*fCcfeoTSEft 

L54o *LT* *a>*5<Cx— *-bv M** ffift. '> 
*fc % §§&DTDi£lMiXMLX*--7^ y>***L 
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yWXKfiSLfciS'&UI*. 09*14. XML 
fc* XML DT DX»l-<:!ELEMENTX*M*,ftOltfca 
[0054] A* * *lfc7-^ 7 ?X»*(D 'J 7 7 U 

>xi4. a«ft/\^>a7;u^uXAic«#*ss^x 

l4**L£BiRr*fcfc(cm>£*i*. C*U4te<7>"?-£ 
TV?**** DTD^ X$-f ;i^>— H?>. X^X> 
a— r-f *ttfilB<P^*»H-C*«a>fc%j:3 

;uzf 'J XAI4»36 CDS TOffiBB tmctot LT&o Zk 

X £tt L fc' \ v T ;U =3 "J XA^OBSWft 'J77b 

it k&\z % *{*MlcX(4tt#»l--7— 

y. ^gi^vxfA^^^^^'jffifflt^ai 

^•jy- i/3>-efcy. w^'j^-v3>t\ tgy 
ja±<Dteo^§ dtd) tm\znc7ju*v 

[0 0 5 5] ±IBT^P-TT:(i. mteZ&Wtf'ZIt&T' 
205 Shakespeare 



215 div 

220 mult 

221 /mult 
225 banquo 

235 quote 

240 quote 

245 /banquo 

250 Hamlet 

251 quote 

252 /quote 

253 /Hamlet 



255 /Shakespeare 
[00 5 9] &IZ* @3 A\Z7f:21\&&?i£^ /<— Xfl* 
«3 4 4lzj3l,vc % *yMt»*yW4i:»LT* 



fcy. 055.I4. DTD/\7i/aSfigttlZtfcCi:3{l<f 

T. MtO>lt«*^ft£*« Cilery. DTD£ft$ft 
tStr^tt^lllLM^^'J (ROM) (D&£S$X 

[0056] @3A. 3BS1^3CI4. S&^fttcSA 
X/<-tffll3 4 4O7L/>v0)1 o£*Lfcti<0t?fc 
4o ^3A|Z^t>T. Xf '>^30O^bXf 7^3 1 
0I4. IS 2 A fZ|H LTiftiS LttfcXf 7 ^2 0 0^b 
*TVZf2 1 0!z#j6-r&*,<7>T*fc£o *T"?^3 1 O 
0)&. fXhXf 7^3 1 2lC^Ot, flLMtT&Jl'lfc 

^«)^fcAl:fXhStiSo Mf!i>b<$?(7)i§^l4. 

/<—Xffi3g3 4 4 14*83 3 1 6lzttoT. /\«r>aXf 

^S5Ri;6<7)^ , n-b^-^4 i 4M50 s&mi^ 

Utf. ft!3 4 4fl!)ftjSt?«ll*4l4^ Xf^3 1 
8t\ ft^Oy^'J 4 1 8&1/5 0 6 (H5&tf 6** 

sb) £m*TX«#&oy^yag^»AS*t£a>f4. 

[00 5 7] 03A. 0 3B;&tfE!3C*T:7F£*I&<fcd 

ft/<-xW3 4 4t, *yjE«ica»-r*fc«>. * 

t\ H2T?3?Lfc/<--Xfil92 3 6£0)|Sff"C* XML 
-tr^aS2 1 2I=£lvC. XMLS [1 ] * X S>. & 

T0)«i: 5 ft**; &rtfe v-<7 r v <ro>mm&mtf± 

[0 O 5 8] 



[2] 



4o [3] 0>pg]lS3tl4. fifir^A7V2i:^/:7 
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«<B««I*. felTCD^I \Z7n$1x&£o\zm5Ztl. '\ [0 0 6 0] 



9* 






133 


Div 


326 


Mult 


371 




787 


Quote 


629 




411 



[006 i ] ±E/\y>a."ey [i ] 

205 133 



215 326 

222 371 

223 /371 
225 787 

235 629 
240 /629 
245 /787 

254 41 1 

255 629 

256 /629 

257 /41 1 



255 /133 

[0 0 6 3] 0 3 AC:M&<>:. ^-^13 4 41*. X 
^rv?3 1 4^t>^©T-^$*Xfc^«l3 56±(D 

"a" *vft|*5. #fH** "a" &t/ " b " \Zfc'otz& 
83 5 6 11 13 Bfl)»S5t4«Mt*4*it6W, 

-?-H(3||^L-CfilS3 4 4^l^^^tLTt>^ 0 

[0 0 6 4] H3BIZ»U. Al3 4 4(t iSB^C** 
*lfc«JM»3 5 6±(D "a" 3^6. *»tt^xy^A<il 

< 0 *(D^xy*j&<5gfT**L*±* ©3S3 4 4I* TYE 
SJ ftBllCJ&CT* affB3 5 8±0 "c" ^|RJ3^5. 

S3 s en H3ca>»i6-r*«iMa(-tt**fcfcy. 

^x-;^A<§|ft*tu>S:^ofcig^. SB313 4 4I*. Tn 
Oj ftflJlcBCT, R»tt*x**MlfT**i***f3 
^SHRtifX hXf7?3 5 2^W5, S^tt^ 
x^j&<|gfT**t&£* *&5l3 4 4li r Y E S J ftSHZ 

^x^^^m7**v*A^fca^ii. ^s3 4 4i*M 

©3 5 8-t<Z> "cT ^(Rj^5o 
[0 0 6 5] Bl3CI=|q]**3£« SJ^tt^x^^A<SlfT 
**lfc«*. Bl3 4 4lt jM*CJF*ftf=*JM»3 5 
8±(D "c B S^tt^x^^X^^^S 20^[S] 

— a»tt*x**tf»***ttt*ofc»d. 
©313 4 41*. &l§T-iF*;h,fc#^l§3 5 8_t(D " e " 

£^fx«^Xf^^3 2 6^It;o 



[0 0 6 2] 



[3] 



x y 0 t SBStt? 1 x v *? <Z> E% t> t S« 2 o i§ 

£\ ©313 4 41*. MT^$tlfcM83 5 8±(^ 
"d" 7^v3>I^f7^3 3 4-sIt;o 

[0066] SJ^ttfi7^Xf7^3 2 0lt 
|»^Pt7^4 1 4&IX5 O 5£ffll>T. gflStt^x-/ 
*SS6frU lfiftA!3 4 6©-«l#$»aLt^ 

©3l3 4 6(DM#tttefif*£ig£^T^£;K£o 

S^14X^y^3 2 6tt»?*l3 4 80)-Si 
[0 0 6 7] [3] fzS»ti«NM*»cj:iJ. 3t*W 

<DfWWW«**i*. <**ic. [3] -eiicS*i.«IBJI8 
SI*. [13 T-***t-5)B«aSJ:y t^'JMKi 

nrfey. **u::»i6l-c [3] vmzti&mmt* n 

[0 0 6 8] B3ClcRy. »»tt^x^^*ta«!S 
/^AI3 44(iByfi*;*Xf;?3 2 2^14** 

5o cc-ei*. Ky^tttt**v4i, *93 2 4t^i 

*X HCttoTl^ftA^if 3^S-«Wlc#*-r*. tto 
T. mTLlt. [2] £#B9-f&<b. "Hamlet" t "/Ham 
let" O^^^TI* "Shakespeare" t "/Shakespeare" 
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"Hamlet" $7<7lt. Sc£(3 "Shakespeare" £ <f * 

l£. 5lMzm£y^oTl^l>rt^T*fe&o 
[0 0 6 9] -75. Ky*<tttBa?*tftA^ofc«*. /<- 
X&S3 4 4|J^-^V3 XDj5il3 4 8^fp)^5o 
(Zfcl^T. DTDS^XMLX^CgLt, *JJ&-f 
-5^P-fe^if4 1 4S^50 5£fill*fc* SM^te^x-/ 

t^t'fe^ e Sot, ^J^tt^x^^li. "Ha 

mlet" (DZW^Ttf "Shakespeare" 0>£ $f ^TOPel 
KIEL < *X hlZ^o-CO^xh^^-TCDIZ^L. SM§14 
^x^li, d*t£l±S&y, "Hamlet" OZV^Tfr 
"Shakespeare" 0^^70Fll^i[^^X Hzfco 
TU6<h:l>5iELl^*X h0^x^^/£ltT*^<. "Ha 
mlet" 0)$7'<7tfZ.<b75)£X^ jESfc*X hi:ftot 
l***£3*<D*x*$fcfT3. Ma. 12* §118. "Shak 
espeare" 0^<f^7t^ "Hamlet" <T>$ ?<D*7<Om\Z 

4*l*A^*53&MciPx.. "Hamlet" $<f*7tf "Shakes 
peare" CD9 0>t7<DlllllC*X M=tt U 3*. 

[0 0 7 O] fiSttAT"^3 2 6 SSHT-r«fca6. D 

^3 1 8 T?iJSiftfc7-^ 7 7 ?XtO)/\7 vaOtt 
Mlc-Srf ££51^ DTDXIiXMLX*— 

< e »3tt7-xs/*XT^:?3 2 6(3\ Xf7^3 1 8 

$ ttt* DTD/XML X^--7<D«3gS^<t £lt|£-f 

505 <:133>: 

110 < ; ! — Th is is a comment — > : 
515 <:326 class="preface"Name1=' r va 
520 <; 371 list=&:lt;>; <:/371>: 
525 <:787>: 
130 Say 
535 <:629>: 

540 goodnight<:/629>:. 

545 Hamlet. <:/787>: 

550 < : 41 1 > : < : 629> : Goodn i ght , Ham I et. 

555 <;/133>; 

[00 7 4] &T*70>mmt (-fiStt^l*. MteZV 
l;:<;section>:£l^«^£ffl^&Ol::>tt L. <:/section>: 



Z*3 2 8&1/3 1 8 0/\^*>i3tff(Diea. CCT\ @ 

[007 1 ] 5^ttfi7^(Dl «kS3 44l4Hy^- 
x7^7Xf 7^3 3 0^fS)^-5o CdTrli. £:B13 3 2 

*T**l,*. ISy^fc^ofeS^ /<-X&l3 4 41*. 

*u 9?<bX¥9i&#\t+ ms2LXfe\z&&4-\ 8xii 

5 1 2(D> : E , J^b;^$tl^o LfrU 

\t % *ya>x*aa» : Ey*ai*«»*Jh.aL^ s^<& 

* 'J SSEI*. Z:W=J:y/\?i':iaHI«f£l*fcft*. 
^^33 4 OT?^$tL^<fcd(c. iSiaTf3FS*ifc« 

^83 5 8±(0 "f " ^tmy&L* *<D'&®3B0>& 
«"C***lfc*IM»3 5 8±0)»Kt4 M f " ^rs]^ 
IV *(D*jStt-C5**ftfc»*tt3 5 8Jl<& "b" 

Wti3 ACDA«t?**#tfcaiMsa3 5 6JlO> 

-X«II3 4 4ltXf7?3 4 2t87tl)o 
[007 2] XML^SSP [1] it. /\?*>j.»3S<Z>$i 

[0 0 7 3] 



I ue1 "name2="va I ue2"> : 



[4] 



<:/629>:<:/411>: 
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*a>^^<D*ias/\^viiz-r citric. Httttt=ii 

http://www. w3. org/tr/2000/rec-XML-20001006. html 
T?A^-C#* (XML) 
1.0 (B2fifi) W3C1§, 2000^10^ 6 Bj (D3.1BIC 
^.ibft&cfc 51-. <:/Name Attr i bute> : |r -3 T ;k £ *X 
& e gl£<7)iH&T*l4. "jy" 14. SlfifilCcfcoT. #J® 

■TS. >^E'J*T-I4 V X* (*S7*^£i8S'J 

-t&) t«iB-r**ai*«i»-r*cfcait*»*:ft*»* 
t£y*«. chi*. J6iTo**a?*as:Aa-efffe*t 

«. Willi. ( i ) *T*<fVt>*Z.k**1rtz*>lz* 

(i i ) />^i/3.$ya>t6*y^iKt)ya>ttJ8**-r^ 
— A>fiteJBit*c£* xi*. (i i i) S^l="7v^> 

*(f&*L«. (i i i) BM6^y0)/\^->a. 

( i i i ) 14. »»e^^/\7^a7^yxA 

133 

133. 326 
133. 371 
133. ~371 
133. 787 
133. 787. 629 
133. 787. -629 
133. 787 
133.411 
133.411.629 
133.411.-629 
133.-411 
-133 

[00 7 9] [5] (CfclNT. *X hlZ%:^tz$7(Dm 
at(4. [3] ■C3F*#t5IB***eu -I0ie/N'r>a 

So SB« [5] (D3-*<DfTl4. [5] <Dfr4atffT5 



( i i i ) l*^— ;u^9«f3r^5/a> (i i) k&kA, 

zmcxSbi. 

mi a**<rtfi** hiw*oTi^<tc5-ei4cfcy^u^< 

14. *^123A^^987a>rta»=*X MCttoTl*5£C 
5TM4. *X H::ttofc»?1230J:?lC^-*-ft:by 
13. 987. 1230) «fc5l^t^t A<Tr^^)o C0>*ti£<b£ 
*lfcy\-V>iXI4 "ffi5S^tttc w /W>3.|C«ty. *fc 

[00 7 6] tt9£*ifcS9lf*. /\^->a.lz*^<iBJi 

^£ttltfct(0T*fe£o ftot, Willi. "Shakespear 
e. banquo. Quote" 0>f&3t(DM&X¥&\\t+ 3 

[007 7] XMLS [3] i=»-*-«*«<b**ifcS» 

<D/V>*>i*Jf!U*fc. VLT<D [5] Tr^tl&o 
[O O 7 8] 



[5] 



&LXmZh^t>2^tz^ oG>a=p-e8£*i£. CCD IO 



->01 3307870629 
->-01 3307870629 
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[5] om&Z&'miz&Ltzlttol*. ZLTO) 
01330326 
01330371 
013307870629 
013304110629 

[008 1] tztz. DTDXttXMLX^-Vgig^^ 
CJ5}£\z& otitu t^t*^l» 0 

[0 0 6 6] »ffi:/\^->jL3&<fTto#LfcA*X»l&^6a) 
^^f-lz^h^. jgff i / \ 7 va A^ntr D T D O)ri0 
1 OXI**g»(D&fI<Dtti&l*. XML/<— tl-C0S^14l^ 

DTDXIiXML^-7l:J:otSi*iifc^: 

* h =i - hMt-e * « d t tft>fr*o 

[0 0 8 2] H4I*. mx.lt. DTDXIiXMLX^- 

ZtztbOmme 0 0&7FiLtzt,<DT-foZ>o *£3g6 0 0 
I*. feliE$ni»*7-^T^^^*<II^^X^^^6 
0 2^b^^ o ^f*^6 0 4T% ^(Ot£ 

31* ^1*. mSX\*m6<D&*(»Zfn±yV4 1 4X1* 

5 o siCcfc^T'j-tr-y h^tt^o 04a>i&gstc*5^-c. 

6T*I*. fV^'J — £^<7>;u— hi*. #*0>:7a-fe*y 
1J-4 1 4X1*5 O 5CD1 Od^oT'J-fe-;/ h£*U -£<D 
^7— ^T'>^X#+a>*<Z)*^A^ Xf7^608 
T'^iJStl^o fXhXf7?6 1 01*. Xf 

^6 O 8^glJ*^fc^^^^^T*fe^^^^^ 

fdBf-r^o "fx hx^f^^e i oaw k> h-ei*. *&3i 
6ooii ry esj ^^njic;i:txf ; >^6 i 2^fSj^ 

5o *TVZf6 1 2T*I*. X^f^^6 O 8T*^iJ^?^fc 
S^*. &/7CD^P-t:*V-9-4 1 4X1*5 05(D^10 

$m*fct£3ft*^l::fln*.So A16 ooiUf^ 

^6 1 235)^bXf V"?6 0 8^15, 

[0083] fx i o*<. x<D*tri*mife 

matter. ffi5§*^*< "o" frtzofrzm&iLicmo) 

t^fXhXf 7^6 2 4^fS]^5o t.Ltn-$^« 
^(C|* % &S6 0 0I* TNOJ S^lrJSlSTXf -;/:76 
O 6— fo)^5o 1 4T "O'OfS^ei 

[0 0 8 4] fX KXf 7^6 1 4 A*. ^<Z>te*< 



[6] CDJ: ^('TF^tt&o 
[O 0 8 O] 



[6] 



"0" T?4l^tMBLfc*4, AL9I± TNOJ *EP 

5o *5-ctti***i::i** A16 0 0IJ TNOJ ScEPIz 
JfcCT* £.*<D^»J41 8atf5 0 6*CXS'JXh 

fX hXf7?6 1 6A<, ffiSI^^f 'J-^y 

co;u- hi=«Li^ctftfiKLfc»«. *&3i6ooi*# 

*©^n-b^-*M 1 4t505 £JBt*"C. ry ESJ 5k 

jmieooit xf ^*^6 2 2-effi3l*<7£f 

O 8 

[00 8 5] fX^Xf 7^6 2 4l:M^i(:, Cttf: 

* * ft & "7-$ T * IfJJ^ 3 £ L U |f*U* 

fcfcfcl^ L^U BJ!telCl*iS&£ftTl>fcl^ 

tux h$fx ht^iiVRD yx h#fE y tti$tii)o 

*&3I6 0 0&t/VRDlCInl^5^^^-ri)«l^l*. 
RD 'JX hA<Xf 'y^6 2 6 CDHricSfitStL**^ V 

rd»jx h*aji6 o otiiRWjri*Hi»icftya-rc 
c£*<t*£. esoux h(*xf ^^6 2 6<D«rrz«ka 

6 O 0 [Z-^xP>tV^)o 

[0086] 2 6|zM^><h. ItT^^aiCV 

RD'jXhMX¥T*#5(Dt\ Xf7?626tt»'J 

*«»r-r^o t L-tar-feSii^lCl*. «il6 0 0li 

r y e s j ftwizjccr. x»i*Ka-ca5*tas-r* 

Xf7^6 2 8 [C[S]^5o — XS'JXhAWRD'J 
X h(rl*^P,tl>S:t^A*Si#Oli^. AI6 0 0I* Tn 
Oj ^miCI^CT. Xt^iatfe«>i:lftl)Xf 7 
^6 3 0|-fS)^5o 

[0 0 8 7] Xf 7^626t^bl:PL<^b^lt 
1^**913. WIBIB^T'I*. X«»JX KCO^TCDA^^: 
VRD 'JX h0>£T(DA2l£ltg*r&o 
I*. Xf 7^6 1 6 0ftC, Xf7^6 2 6i:P80X 
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:tlJil±^cff^ h£f7*rf^ «il6 0 03!)<Xf7 
J|ofrof::i§£\ jS16 0 0liXf7^6 2O^i:[p]A> 

tct£*,ics^it«i^^»7r^)Cih^r*^^o vrd'j 

01 

02 

03 
-03 
04 
-04 

-02 

05 

-05 

-01 

[0 0 9 0] «t£®fr£>W*.li. 0 OUfcV-^7 

:fc>*> [ 7 ] <D£MfTlzfc& "or fr* [7] <DWZ31t\Z 

-03 w fit, f(D^0S(ost^i^ 

^^itffi3l/N'r>aia, -T*^^ "010203" £«fe* 

CDgl^^y (CttAO)**. [7] (D5fT@(Cfe^> w 0 
4" ) $iOlt§*T% $S7*^£}TC\ &£«IS)£o 

tti^«»/\^2/a.a«ta«5Br. x^r?6 2oii. 

DTD/XMLX^r-711 
^6 1 Ot\ 7^6 2 O0M/N'>ViSSlCfc§ 

[009 1 ] Xfv?6 2 6T*(DirX hi*. — flftWI^. 

«»|:DTD'JX H:t**t»«3l««©*^t 7 
ht'fc^o Sot, S^^XMLXSIi. *jj£-tsvr 

6 2 6fCfclt&-fi&#jfc7 L X UN*, DTDOcfcyaSl* 



8 i*#a Lti^xt^sa-cfciz <t 
[oo8 8] wraG>fiatta>*afet*&rcKwr*fc«) 
13. )5iT(D«^aj (c;:tn*§m&*$w "or** -05" 
T*fcy. »et5«7$yiit*if*i "-01"** --05" 

[0 0 8 9] 



[7] 



7viSI "0123"!*. »«t4DTD^&0/\7va 
S3! -01 230456" t VcML Lfc»$. r»Stt#fc*j £ 

[0 0 9 2] H4fZ*S*LSfiatt«Ba6 0 0tt. KiS 

[0093] Sitt^eoo^iLt raw 
sai-v>Kyffla35<jtfT**t*. 

[0094] ftTEAftl*, «ji*x*$il*8dlc*fcl* 
[0095] ca)S^tt&t;Sittfi^^^S, 
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zttfvzz>t^ommt<&ti b^i). ztiit. 'j>ta< 
^fr-r^ct. xii(ii)s^T*^t^#(7)m2CDaj^$ 

icggfe-f^C* (X»0>Bl «#0>»«tf*tU::J:-3r 
[0 0 9 6] I^tc^&J "Tat^. * 

H**i*nrtttt*<fc*a^ B«i=Bfrr«fc*>i::^B 

±tt/\**>i»i*«BH-*tu SiRStv&o 

[0 0 9 7] XBttBaBKBaOfeAaBBMB. x 
14. ttm<D'£m&Zf%to7*—'*V H4. A7va7^ 

atfxia^tt^at^Ascii»*3&<Rr«6i:ft4. ankdh 

jWff«-r*. (B±tt) A7*/at ^(Dffffi 

i&jZtmtt-e&^ttfm-r zh&o ;:tu4. xml7 
*^»&3h*. =L-—?v* AMA<tt*n?>-ri*»*i^ 

[0 0 9 8] l»S»-C^f=«fc5lC. e&Bft£c:*>|z % 

s#iifcy, *A/*£y racers*, ctu** w 

B*Lfey* 7Kij>{rLfc««BW, 
1 OiSliOT— ^7^^*y*AlB^K^«^S:*Mic 

Ajto&Bfc*tl4»*lcm*&tl5. ZCrtt. JMIC, 
«J*r^By^x^^(Dfctolz*<DJ:5ftCi:*!T3jBB 
I4£l> 0 RTi£/\*v>3.T;U^yXA. XI4i£/\^>J.7 



:J»JXAI4. 7-* 7 7 ^XiOSB» 1 9f fl(D7 7 
XA. XI4ffl/\^vzLT;Uzf'jXAI4. S«ffllzBEi::# 

**t"CL^a>T?eai***t*t^ 7-^7 7^tt' 

#SB£ti&i§^tfc£o PliE/\^->zL7;U=f'jXA. X 

i4Rii£/\^i/zLT^=f yXAici4. (i) 55±#«*r/u 
=tuxa. &c;(ii)/\77>»«4b« *<«£Lr*if& 

tt£> 0 

[0099] ±e««i4. #tosm*0>i oja±*«ar 

(#Jxl£. XML. DTD. CSS. X S L If) £HF 

UTF-16) SttfflU Xtf/Xtt. ^^«Og$l4~^W 
|:t©/\-r>iS8J:yt.fi<4l\ (iii)v-*7*y 

^x»*ffltx*xi4ait»*ffifflT^u^— >3>i*. 

7— *?7V~JXM. XMLX^r- TXUtDTDC&^lC^ 
ftl?*. 1 o©W ha>B«l-"<^*#ofc. BttttftB 
MSMMtt*. (iv)A*-*— *T*:/X»I** 

tf&srfcSu (v) &i;/xi4. 
ryvr-i >3>i4. y^usa («*.i*. *a*&^5S 

XI4«««C PUvXfA) Xliy^ySS (Wxl4. 

ffiS>^y^au>->x^A. Xte^-f^S ^>^'J*< 
fj y ar b*it vxfw ic* li*mb*<&«* 

^3t»±T?*^. < BfM" 2 7 v Zftt— 

-0-&1//XI4. 7^^-va>t'M. 

[0 1 0 0] *MW 7 7 ?f HJt* 
0)B«»ai*. 7-*77^tKt©*MBfc X 
l4-9-^HB*S6fT-r«. 1 oja±<7>mm[HlK^a)llffl/N 

14. BB^n-feytK tt-t »*;ufl»:7n-fe*U\ XI4 
1 oiilJiOv-r^ Q^D-tz v9\fttfBB*^y StLt 
l^Ttcfcl^ 7-^7 7^fgXf^)S«T^lt - 

> e^-^ i/Xf A 4 0 O ^L>t, 
mfT-th^ttfT-ZZ* f^ia^vXf AI405r^ 
£*L&o COvX^fAlCfcl^r. 03 A. 03 B. 0 3 
C. &tf0 4(Dfll3II4. i^3>fcfa-^vXfA4 

o o *r3MrStL*r *r-is a >^p ?7^o) v 

? h<t>x7<t LTBfT Stir 4*1^. 

^A4 0 0I4. &£!ttlcl4. (0lzl4^*tvr 

t^t^) ^0>X> KvX^AICffl^&^tL. ^'J>*« 

cfcor. -?—<?7 v^»Ba>»Sr»a©XT-t/^fft3 

f(DV7^i7lt H*ttL»ffl^*y (RO 
M) 4 18. ^>^AT^-bXA^y (RAM) 4 1 
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[0 1 O 1 ] i^3>t'a-^vXf A4 0 01*. zi 

f^va-;u2 2foA*si, y3?*tt**ta 
(lcd) mo&JiStm* ^»j>^x>i;>4 
0 2T'M*tL§ o >tfi—* 4 o oi*. a 

£ 0 a>ea-**^ h 1 ?— ^4 04lCgtS*tlfcflfeCD 

I*. Affl* (I/O) 0^-7i~X4 0 8^<7)Jg£! 
4 0 4lZj:oT. »^a^n>fcf a— i > 4 O OlZ&£;ft, 

-So 

[0102] ifcl&^avtf j.— st^zL— ;U4 i oi*. 

^□-b-y-y-iL-^ h4 1 4&tf>^'Ja-7 h418, 
Mfc. Willi. ¥Si*^>^AT*-feX;*^'J (RA 
M) . ^aX^Hffi>^ r J (ROM) . X-f^M^a. 
— ;U<t LCDO^-7i-X4 1 6^^t7Atb^l ( I 
/O) -f>$— 37x— X. ?'J>n>v>4 0 2t^ 
^ -27 4 O6ffl0)AiiJ*'<>^-7x-X4O8f 

*g/££*l£o >e^-^ 4 1 0<D««S*4 0 

8&tf««S*4 14-4181*. 
/U4 12S^LtSlLTfey, C*ll*B3SttffiT?*n 
b*lM^3>tra-^i/^fA4 1 0(Dtt*i^ 

^P-feylM 1 4*<:/Dy^A$R^&A/^tfJ«l-r*. 
[oi 0 3] 7-^7-^fg»©jB«r*a 
I*. @6-C^^tt^J:5>fe. «*SiR.ffl3>tri — 

AlZfclxT. @3A. §3B, ID 3 Co &tfH4 0>ffl 
311*. 3>lfa-^yXfA5 0 0+T^t^7^'J 
^-va>^P?7Af0V7 h^i7i LTS8fT**t 
Tt£l\> :(D7^'J^v3>lt 09x1** /\^->zl 

LTffl^e>*L*«^lct6ttOo 06 1*. 8S#ait^ 

[oi 0 4] v— *7?:/MKt»<D«*r*fta> 

i70$^i:«toTfT^^ 0 f0)V7h^i7li2O 

I*. ^(DV7h^i7lt eiTlC*-TE«SI«**"r* 
t7xTI*n>tfzL~^ Pj^jg<*^t)3>fcfzL-^(CP — 



h->i7t*«a>i;a->WMft XttUT 

[OI 05] 3>t 9 a-^yXfA5 0 0lt =J>tfi 
-nva-il/5 0K TfC— K5 0 2*tf7«>X5 

151 4**t?ffl**«-CfllSl**t*. £H-«iS (^ 
fA) jHS<iSM5 1 6 I*. 0H*_l*. i§85 2 1 XI* 

0^5R73lPl-efi(i-r^fc*!>lz. nyea-^^va-;!, 
5 0 1 -Cfill^€>*l6. ^fAS 1 6 1*. -f — h 
m^«;h7-^yXfA, Mxli. P— *WJ7 
K7-^ (LAN) Xte^-f Kl'JT*^ K9-^ 

(WAN) . □ >ea-$i 5 0 0 tHILTt^te^ 
-Vt/^>fcfa-$ (PC) 5 2 2*'v7*-feX-C# 
&cfc5 f::^&fctf>l::fi3l*&*L&o 

[OI 0 6] 3>bf j. — ^va- ;us O 1 I*. tl^I 
1^1*. W<i:i1^^Pt7lfa-7h5 0 5, * 

^-bXy^E'; (RAM) . BM^^IMM^'J (RO 
M) . eft^f >^-7i-X507 ^f£A* • a 
* (I/O) -f >^-^x— X. +-*-K5 0 2Xtf 
7^X5 0 3ffiC7>A^I ■ fcti*K>£ — :7x— X5 1 3. 

. ^fA 5 1 S<Dtz#>(D'( ^5 — 7i- X5 
0 8Tr<SJ$£*l&o 

[O 1 0 7] E«K»5 0 9*<KI+&:h.Tj3y. mut. 
/\-Kf-fX^ K5-f ?5 1 Oltf7P^ t°— (eeia 
«) f^^7 T 1 tta^r-^ 

K^-f? (^H5t) *ffll*T*>J:l\> CD-ROM 

^7"5i 2 1*. *wtt©f->at LTa«fis^.t>ti 

& 0 a >tfzL — nva- ;U5 0 1 (D«J«5S§5 O 5~ 
5 13 1*. «S18«**tfc/*X 5 0 4 ^Iti^fi 
L. ^ftl*§ia&«T-» t>*1,£ =i >fcf:i — $ zs*t£>5 

0 O0>«E*Blft*— K«)*5I=«:4. *S«»«(CfeL\ 
T|^fT-e^i>a>e3. — ^1^1*. IBM PC. IBM 

&tfSun SparcstationsX(£?::**&%BLfcRtil 
tti/XT-AtfftllZflSlf £*X&o 
[0 10 8] JUtttfcf** *Hffi^fi|fzfe[t^T^'jy 
-va>^P^7Alt /\— Kf-fX^ K9-f ?5 1 0 

T'P^^ASa^tafflltio K9— £ 5 2 0 

1 o^aBLT. ^tM^eu 5 0 6Sffli^«ait 

S^(cJ:-p-C(*. 77*'J^-y3>7 , ny7A 
I*. CD-R0MXtt70^ fc? — *T-f X2 lex — K 
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2X1*5 1 1$<MtS^«^*\ Rt> l )lz s =Et 
Agf5 1 6*^^*7 h9-^ 5 2 0ltPC5 2 

2 b zl— if iw cfc o Tgfc£*uT £ <fc i>. 
[0 10 9] Kft-r— ROM. JftaialBk 

*;u. PCMC I A*-K*0)3>tfi-*Rrtt*- 
5 00l:V7^.x7^n-Ktl>C<h4,T^5o -tie 

[oiio] (m^±<Dmmftm &Lt*&. *^^<t> 

ftl^By, «IEatf/X(4«S*fT5^i:A«"C#*. -t 



[Si A] **BBa>»jBft*lfi^fi^«*XML/<— 9" 
[Hi B] **MO»iSttjglfi^»l^ff*XMLy<— 9" 

[H2 a] a«tttt»©tt^x^^atf/xi*»att^ 

[0 2 B] a3Rttot£&f&&*xv9&lf/X.\t&£1t* 

i7^$*t\ a*a»i:6it*sAx/<-^ax 

[@3A] S2A&i;i2BOSAX/^SafiLfc 
[03B] H2A&(;@2BCDSAXy^^»Slf: 
B3C] 02 AfttfB2 BCDS AX/<- *£3fc&Lf:: 
[04] D T DXIiXM LX^r-7f 
[05] JftStlfcSAX/<- »f©««l**fT"C**» 
[B 6 ] 3tA**tfc S A X/<-tHB««£ *frT?#*ia 



II A] 



[01 B] 



^-102 



(DTD) 



110 



100 



104 



y-h(CSS) , 
•>-h (XSL) | 



i cssxri 

XS L/t-tf 



1 A 



106 



108 



112 



r ^ 

! ! 

(DTD) 1 



ff4 



a ib 
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[H2A] 



[IS 2 B] 



236 




j=g-:---:--- 



214 



*ftx>\z dto 



220 



226 



-230 



v: 



232 



m 2B 



[S3 A] 



344 



3A 




(4iaxii5osE5iirN ^ 
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1. Title of the Invention 

HASH COMPACT XML PARSER 

2. Claims: 

1. A method of parsing a markup language document comprising syntactic 
elements, said method comprising, for one of said syntactic elements, the steps of: 

identifying a type of the element; 

processing the clement by determining a hash representation thereof if said type 
is a first type; and 

. augmenting an at least partial structural representation of the document using the 
hash representation if said type is said first type. 

2. A method according to claim 1, wherein said parsing is event-based 

parsing. 

3. A method according to claim 1, wherein said hash representation is 
determined using one of: 

a hash algorithm; 

a first reference to said hash algorithm dependent upon an associated Universal 
Reference Indicator; 

a second reference to said hash algorithm dependent upon an associated 
namespace; and 

a third reference to said hash algorithm dependent upon an associated Extended 
Markup Language declaration; 

4. A method according to claim 2, wherein said first type is one of: 
one of a structural element and a part thereof; 

a definition of said structural element; 
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a declaration of said structural clement; and 
a match for said structural element. 

5. A method according lo claim 4, wherein said structural element is a tag. 

6. A method according lo claim 2, wherein the hash representation is a 
unique code for said one syntactic element, said element having less than a first number 
of characters. 

7. A method according to claim 2, wherein the hash representation is not a 
unique code for said one syntactic clement, said element being constrained, to a 
probability level, in terms of at least one of (i) a number of characters in the clement and 
(ii) a permissible number of permutations of characters in the element. 

8. A method according lo claim 6, wherein said code comprises numeric 
characters. 

9. A method according to claim 2, wherein said processing step comprises 
a sub-step of: 

determining an extended hash representation of both (i) said one syntactic 
element being a first instance of said first type, and (ii) another syntactic element being a 
second instance of said first type, within which said first instance, said second instance is 
nested. 

10. A method according to claim 1 comprising, for another one of said 
syntactic elements, further steps of: 

identifying a type of the other clement; and if the type of ihc other element is 
equivalent to said first type: 

(i) processing the other element to ihereby determine a second hash 
representation thereof; and 
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(ii) augmenting said at least partial structural representation of the 
document using the second hash representation, wherein: 

said processing and said second processing ensure that if a first relationship 
exists between the one element and the other clement, then a second relationship which is 
representative of the first relationship, exists between the hash representation of the one 
element and the hash representation of die other element. 

11. A method according to claim 10, wherein: 
the one clement is a start tag; 

the other clement is an end tag; 

the hash representation of the one element is a corresponding hashed start tag, 

and; 

the second hash representation of the other element is a corresponding hashed 

end tag. 

12. A method according to claim 11, wherein: 
the end tag is a first modification of the Stan tag; and 

the hashed end tag is a second modification of the hashed start tag, said second 
modification being representative of the first modification. 

13. A method according to claim 12, wherein: 

the end tag is the same as the start tag apart from having a distinguishing 
character incorporated therein; and 

the hashed end tag is at least one of: 
the same as the hashed start tag; 

the same as the hashed start tag apart from having a distinguishing character 
incorporated therein; and 

the hashed start tag having been processed by an operator. 



14. 



A method according to claim 12, wherein: 
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the one and the other element comprise respectively a start tag and an end tag, 
being a first pair of tags; 

corresponding hashed start and end tags for said first pair of tags are 
incorporated into the partial structural representation of said document; 

the document further includes a second pair of tags comprising a respective start 
tag and end tag, said second pair of tags being nested within said first pair of tags in said ' 
document, said method comprising further steps of: 

processing said second pair of tags to form corresponding second hashed start 
and end tags; 

augmenting said at least partial structural representation of the document using 
said corresponding second hashed start and end tags so that said second hashed start and 
end tags indicate a nesting in relation to said hashed start and end tags for the first pair of 
tags which is equivalent to the nesting of said second pair of tags within said first pair of 
tags. 

15. A method according to claim 14 comprising, prior to said augmenting 
step, a further step of: 

concatenating the first hashed start tag with the second hashed start tag, and 
concatenating the Cist hashed end tag with the second hashed end tag, to thereby form 
respective extended hashed start and end tags for said second pair, wherein: 

said augmenting step is performed using said respective extended hashed start 
and end tags for said second pair, and: 

said extended hashed start and end tags indicate a nesting in relation to said 
hashed start and end tags for the first pair of tags which is equivalent to the nesting of said 
second pair of tags within said first pair of tags. 

16: A method according to claim I, wherein the augmenting step * is 
succeeded by a well-formedness checking step against a syntactic rule, said well- 
formedness checking step comprising checking said at least partial structural 
representation of the markup language document against the syntactic rule by numerically 
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comparing corresponding hashed representations of elements in said at least partial 
structural representation of the markup language document 

17. A method according to claim 16, wherein said numerically comparing 
step is succeeded by a further step of: 

string-comparing, in accordance with said syntactic rule, corresponding non- 
hashed representations of elements not of said first type. 

18. A method according to claim 1, wherein said first type is one of: 
one of a structural element and a part thereof; 

a definition of said structural element; 

a declaration of said structural eLement; and. 

a match of said structural clement. 

19. A method according to claim 18, wherein said structural element is a tag. 

20. A method according to claim 14, comprising a further step of: 
checking the well-formedness of said at least partial structural representation of 

the document against a syntactic rule. 

21. A method according to claim 20, wherein the syntactic rule relates to 
proper nesting of tags and said checking step comprises sub-steps of: 

performing a numerical comparison across hashed tags in said at least partial 
structural representation of the document to thereby identify said first hashed start and 
end tags and said second bashed start and end tags; and 

verifying that the second hashed start and end tags indicate a proper nesting in 
relation to said first hashed start and end tags. ; 
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22. A method according to claim 21, wherein the numerical comparison is 
followed by a further step of: 

performing a string comparison, in accordance with said syntactic rule, across 
□on-hashed parts of respective tags in said at least partial structural representation of the 
document . 

23. A method according to claim 15, comprising a further step of: 
checking the well-formedness of said at least partial structural representation of 

the document against a syntactic rule. 

24. A method according to claim 23, wherein the syntactic rule relates to 
proper nesting of tags and said checking step comprises sub-steps of: 

performing a numerical comparison across hashed tags in said at least partial 
structural representation of the document to thereby identify said first hashed start and 
end tags and said extended hashed start and end tags; and 

verifying that the extended hashed start and end tags indicate a proper nesting in 
relation to said first hashed start and end tags. 

25. A method according to claim 24, wherein the numerical comparison is 
followed by a further step of: 

performing a string comparison across non-hashed parts of respective tags in said 
at least partial structural representation of the document. 

26. A method according to claim 16, wherein the well-formedness checking 
step is one of (a) succeeded by, (b) included in, and (c) replaced by a validation step 
against a validation reference document VRD, said validation step comprising sub-steps 
of: • 

(a) processing the VRD, said processing comprising, for a s>Titactic element in 
the VRD, sub-sub-stcps of: 

(i) identifying a type of said syntactic element of the VRD; and 
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(ii) processing the syntactic element by determining a hash 
representation thereof if said type is said first type; and 

(b) checking said at least partial structural representation of the markup language 
document against the processed VRD, said checking comprising a sub-sub-step of 
numerically comparing corresponding hashed representations of elements. 

27. A method of validating a markup language document against a VRD, 
said method comprising steps of: - 

(a) processing the markup language document, for each document tag identified 
therein, if said document tag is not a first document tag in a corresponding markup 
language document tag hierarchy, said processing comprising steps of: 

(i) determining a hierarchy position of said document tag; 

(ii) determining an extended hashed representation of said document tag 
concatenated with a hashed representation of a previous document tag in the document 
tag hierarchy; and 

(iii) storing said extended hashed representation of said document tag if 
said document tag is more deeply nested than a previous document tag; 

(b) processing said VRD, for each tag identified therein, if said tag is not a first 
tag in a corresponding tag hierarchy, said processing comprising steps of: 

(i) determining a hierarchy position of said tag; 

(ii) determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) storing said extended hashed representation of said tag in a list; and 

(c) validating said markup language document if said extended hashed 
representation of said document lag is one of found in said list and is a valid subset of a 
member of'gaid list. 

28. A method of validating a markup language document against a VRD, 
said method comprising steps of: 



(37) 



&m 2002-99428 



(a) processing said VRD, for each structural element identified therein, said 
processing comprising steps of: 

(i) determining syntactic attributes of said structural element; 

(ii) determining a hashed representation of said structural element; and 
(Iii) storing said hashed representation and syntactic attributes of said 

structural element in a structural representation of said VRD; and 

(b) processing the markup language document, for each document structural 
element identified therein, said processing comprising steps of: 

(i) determining syntactic attributes of said document structural element; 

(ii) determining a hashed representation of said document structural 

element; and 

(iii) storing said hashed representation and syntactic attributes of said 
document structural element in a structural representation of the document; and 

(c) validating said markup language document if syntactic attributes and hashed 
representations of said each document structural clement in the structural representation 
of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 

29., A method according to claim 26, wherein said numerically comparing 
step is succeeded by a further step of string-comparing corresponding non-hashed 
representations of elements not of said first type. 

30. A method according to claim 26, wherein said first type is one of: 
one of a structural element and a part thereof; 

a definition of said structural element; 
a declaration of said structural element; and 
" a match of said structural element. 

31. A method according to claim 30, wherein said structural element is a 



tag. 
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32. A method of encoding a markup language document comprising 
syntactic elements, said method comprising, for one of said syntactic elements, steps of: 

identifying a type of Ihe syntactic clement; and 
processing the syntactic element by one of: 

(i) determining a hash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element. 

33. A method of decoding a markup language document comprising 
encoded syntactic elements, said method comprising, for one of said encoded syntactic 
elements, steps of: 

identifying a type of the encoded syntactic element; 
processing the encoded syntactic element by at least one of: 

(i) determining an inverse hash representation thereof if said type is a 

first type; and 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(iii) retaining the encoded syntactic clement. 

34. An apparatus for parsing a markup language document comprising 
syntactic elements, said apparatus comprising: 

identifying means for identifying a type of the element; 

processing means for processing the element by determining a bash 
representation thereof if said type is a first type; and 

augmenting means for augmenting an at least partial structural representation of 
the document using the hash representation if said type is said first type. 
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35. An apparatus Tor validating a markup language document against a 
VRD, said apparatus comprising: 

(a) means for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document lag hierarchy, said means comprising: 

(i) means for determining a hierarchy position of said document tag; 

(ii) means for determining an extended hashed representation of said 
document tag concatenated with a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(Hi) means for storing said extended hashed representation of said 
document tag if said document tag is more deeply nested than an extended hashed 
representation of a previous document tag; 

(b) means for processing said VRD, for each tag identified therein, if said tag is 
not a first tag in a corresponding tag hierarchy, said means comprising: 

(i) means for determining a hierarchy position of said tag; 

(ii) means for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) means for storing said extended hashed representation of said tag in 

a list; and 

(c) means for establishing whether said extended hashed representation of said 
document tag is one of to be found in said list, and is a valid subset of a member of said 
list, thereby validating said markup language document. 

36. An apparatus for validating a markup language document against a 
VRD, said apparatus comprising: 

(a) means for processing said VRD, for* each structural element identified therein, 
said means comprising: 

(i) means for determining syntactic attributes of said structural element; 
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(ii) means for determining a hashed representation of said structural 

element; and 

(iii) means for storing said hashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) means for processing the markup language document, for each document 
structural element identified therein, said means comprising: 

(i) means for determining syntactic attributes of said document 
structural element; 

(ii) means for determining a hashed representation of said document 
structural element; and 

(iii) means for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of the document; and 

(c) means for comparing syntactic attributes and hashed representations of said 
each document structural element in the structural representation of the document to 
corresponding syntactic attributes and hashed representations in said structural 
representation of said VRD to thereby establish validity of the markup language 
document 

37. An apparatus for encoding a markup language document comprising 
syntactic elements, to form an at least partial structural representation of the document, 
said apparatus oomprising: 

means for identifying a type of the syntactic element; and 
means for processing the syntactic element by one of: 

(i) determining a bash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element 

38. An apparatus for decoding a markup language document comprising 
encoded syntactic elements, said apparatus comprising: 
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means for identifying a type of the encoded syntactic element; 
means for processing the encoded syntactic element by at least one of: 

0) determining an inverse hash representation thereof if said type is a 

first type; 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(iii) retaining the encoded syntactic element. 

39. A computer program which is configured to make a computer execute a 
procedure to parse a markup language document comprising syntactic elements, said 
program comprising: 

code for identifying a type of an element; 

code for processing the element by determining a hash representation thereof if 
said type is a first type; and 

code for augmenting an at least partial structural representation of the document 
using the hash representation if said type is said first type. 

40. A computer program which is configured to make a computer execute a 
procedure to validate a markup language document against a VRD, said program 
comprising: 

(a) code for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document tag hierarchy* said code comprising: 

(i) code for determining a hierarchy position of said document tag; 

(ii) code for determining an extended hashed representation of said 
document tag concatenated with a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(iii) code for storing said extended hashed representation of said 
document tag if said tag is more deeply nested than a previous document tag; 
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(b) code for processing said VRD, for each tag identified therein, if said tag is 
not a first tag in a corresponding tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said tag; 

(ii) code for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) code for storing said extended hashed representation of said tag in a 

list; and 

(c) code for validating said markup language document if said extended hashed 
representation of said document tag is one of found in said list, and is a vaLid subset of a 
member of said list. 

41. A computer program which is configured to make a computer execute a 
procedure to validate a markup language document against a VRD, said program 
comprising: 

(a) code for processing said VRD, for each structural element identified therein, 
said code comprising: 

(i) code for determining syntactic attributes of said structural element; 

(ii) code for determining a hashed representation of said structural 

clement; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) code for processing the markup language document, for each document 
structural element identified therein, said code comprising: 

(i) code for determining syntactic attributes of said document structural 

element; 

(it) code for determining a hashed representation of said document 
structural element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of the document; and 



(43) 



2002-99428 



(c) code for validating said markup language document if syntactic attributes and 
hashed representations of said each document structural element in the structural 
representation of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 

42. A computer program which is configured to make a computer execute a 
procedure to encode a markup language document comprising syntactic elements, said 
program comprising: 

code for identifying a type of the syntactic element; and 
code for processing the syntactic element by one of: 

(i) determining a hash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element. 

43. A computer program which is configured to make a computer execute a 
procedure to decode a markup language document comprising encoded syntactic elements, 
said program comprising: 

code for identifying a type of the encoded syntactic element; 

code for processing the encoded syntactic element by at least one of: 

(i) determining an inverse hash representation thereof if said type is a 

first type; and 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(iii) retaining the encoded syntactic element. 

44. A computer program product including a computer readable medium f 
having recorded thereon a computer program which is configured to make a computer 
execute a procedure to parse a markup language document, said program comprising: 

code for identifying a type of the element; 
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code for processing the element by determining a hash representation thereof if 
said type is a first type; and 

code for augmenting an at least partial structural representation of the document 
using the hash representation if said type is said first type 

45. A computer program product including a computer readable medium 
having recorded thereon a computer program which is configured to make a computer 
execute a procedure to validate a markup language document against a VRD, said 
program comprising: 

(a) code for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said document tag; 

(ii) code for determining an extended hashed representation of said 
document tag concatenated wiLh a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(iii) code for storing said extended hashed representation of said 
document tag if said document tag is more deeply nested than a previous document tag; 

(b) code for processing said VRD, for each tag. identified therein, if said tag is - 
not a first tag in a corresponding tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said tag; 

(ii) code for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) code for storing said extended hashed representation of said tag in a 

list; and 

(c) code" for validating said markup language document if said extended hashed 
representation of said document tag is one of found in said list and is a valid subset of a 
member of said list. 
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46. A computer program product including a computer readable medium 
having recorded thereon a computer program which is configured to make a computer 
execute a procedure to Validate a markup language document against a VRD, said 
program comprising: 

(a) code for processing said VRD, for each structural element identified therein, 
said code comprising: 

(i) code for determining syntactic attributes of said structural clement; 

(ii) code for determining a hashed - representation of said structural 

element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) code for processing the markup language document, for each document 
structural element identified therein, said code comprising: 

(i) code for determining syntactic attributes of said document structural 

element; 

(ii) code for determining a hashed representation of said document 
structural element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of the document; and 

(c) code for validating said markup language document if syntactic attributes and 
hashed representations of said each document structural element in the structural 
representation of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 

47. An at least partial structural representation a markup language 
document comprising syntactic elements, said al least partial representation having been 
produced by a method comprising, for one of said syntactic elements, the steps of: 

identifying a type of the element; 

processing the element by determining a hash representation thereof if said type 
is a first type; and 
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augmenting an at least partial structural representation of the document using the 
hash representation if said type is said first type. 

48. An apparatus for parsing a markup language document comprising 
syntactic elements, said apparatus comprising: 

a processor; 

a memory for storing (i) the document* and (ii) a program which is configured to 
make the processor execute-a procedure to parse the document; 
said program comprising: 

(i) code for identifying a type of an element; 

(ii) code for processing the element by determining a hash 
representation thereof if said type is a first type; and 

(iii) code for augmenting an at least partial structural representation of 
the document using the hash representation if said type is said first type. 

49. An apparatus for validating a markup language document comprising 
syntactic elements against a VRD comprising syntactic elements, said apparatus 
comprising: 

(a) a processor; 

(b) a memory for storing (i) the document, (ii) said VRD, and (iii) a program 
which is configured to make the processor execute a procedure to validate the document; 

(c) said program comprising: 

(ca) code for processing the markup language document, for each 
document tag identified therein, if said document tag is not a first document tag in a 
corresponding markup language document tag hierarchy, said code comprising: 

(caa) code for determining a hierarchy position of said 

document tag; 

(cab) code for determining an extended hashed representation 
of said document tag concatenated with a hashed representation of a previous document 
tag in the document tag hierarchy; and 
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(cac) code for storing said extended hashed representation of 
said document tag if said document tag is more deeply nested than a previous document 
tag; 

(cb) code for processing said VRD, for each tag identified therein, if 
said tag is not a first tag in a corresponding tag hierarchy, said means comprising: 

(cba) code for determining a hierarchy position of said tag; 

(ebb) code for determining an extended hashed representation 
of said tag concatenated with a hashed representation of a previous tag in the 
corresponding tag hierarchy; and 

(cbc) code for storing said extended hashed representation of 

said tag in a list; and 

(cc) code for establishing whether said extended hashed representation 
of said document tag is one of to be found in said list, and is a valid subset of a member 
of said list, thereby validating said markup language document. 

50. An apparatus for validating a markup language document containing 
syntactic elements against a VRD containing syntactic elements, said apparatus 
comprising: 

(a) a processor, 

(b) a memory for storing (i) the document, (ii) said VRD, and (iii) a program 
which is configured to make the processor execute a procedure to validate the document; 

(c) said program comprising: 

(ca) code for processing said VRD, for each structural element 
identified therein, said code comprising: 

(caa) code for determining syntactic attributes of said structural 

element; 

(cab) code for determining a hashed representation of skid 

structural element; and 

(cac) code for storing said hashed representation and syntactic 
attributes of said structural element in a structural representation of said VRD; and 
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(cb) cx>de for processing the markup language document, for each 
document structural element identified therein, said code comprising: 

(caa) code for determining syntactic attributes of said 
document structural element; 

(cab) code for determining a hashed representation of said 
document structural element; and 

(cac) code for storing said hashed representation and syntactic 
attributes of said document structural element in a structural representation of the 
document; and 

(cc) code for comparing syntactic attributes and hashed representations 
of said each document structural element in the structural representation of the document 
to corresponding syntactic attributes and . hashed representations in said structural 
representation of said VRD to thereby establish validity of the markup language 
document. 

51. A method of validating a markup language document against a VRD, 
said method comprising steps of: 

determining first extended hashed representation(s) for most deeply nested 
syntactic element(s) of a first type in the VRD; 

storing said first extended hashed representation^) in a VRD list; 

determining a second extended hashed representation for a most deeply nested 
syntactic element of the first type in the markup language document; and 

declaring said markup language document to not be invalid if said second 
extended hashed representation is present in the VRD list. 

52. A method according to claim 51, wherein said syntactic element of said 
first type is one of: 

one of a structural element and a part thereof; 

a definition of said structural element; 

a declaration of said structural element; and 
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a match for said structural element. 

53. A method according to claim 52, wherein said structural element is a 

tag. 

54. A method according to claim 51, wherein: 

said most deeply nested syntactic elements(s) in the VRD are syntactic 
clcmcnts(s) which are most deeply nested within one of the global structure of the VRD 
and a local sub-structure of the VRD; and 

said most deeply nested syntactic clement in the markup language document is a 
syntactic element which is roost deeply nested within one of the global structure of the 
markup language document and a local sub-structure of the markup language document. 

55. An apparatus for validating a markup language document against a 
VRD, said apparatus comprising: 

means for determining first extended hashed Tepresentation(s) for most deeply 
nested syntactic eLement(s) of a first type in the VRD; 

means for storing said first extended hashed representation^) in a VRD list; 

means for determining a second extended hashed representation for a most 
deeply nested syntactic element of the first type in the markup language document; and 

means for declaring said markup language document to not be invalid if said 
second extended hashed representation is present in the VRD list. 

56. A computer program which is configured to make a computer execute a 
procedure to validate a markup language document against a VRD, said program 
comprising: 

code for determining first extended hashed representation(s) for most deeply 
nested syntactic element(s) of a first type in the VRD; 

code for storing said first extended hashed representation^) in a VRD list; 
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code foT determining a second extended hashed representation for a most deeply 
nested syntactic clement of the first type in the markup language document; and 

code for declaring said markup language documcnL to not be invalid if said 
second extended hashed representation is present in the VRD list. 

57. A computer program product including a computer readable medium 
having recorded thereon a computer program which is configured to make a computer 
execute a procedure to validate a markup language document against a VRD, said 
program comprising: 

code for determining first extended hashed representation^) for most deeply 
nested syntactic element(s) of a first type In the VRD; 

code for storing said first extended hashed representation(s) in a VRD list; 

code for determining a second extended hashed representation for a most deeply 
nested syntactic clement of the first type in the markup language document; and 

code for declaring said markup language document to not be invalid if said 
second extended hashed representation is present in the VRD list. 

58. A method according to claim 7, wherein said code comprises numeric 
characters. 
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3. Detailed description of the Invention 

Copyright Notice 

This patent specification contains material that is subject to copyright protection. 
The copyright owner has no objection to the reproduction of this patent specification or 
related materials from associated patent office files tor the purposes of review, but 
otherwise reserves all copyright whatsoever. 

Technical Field of the Invention 

The present invention relates generally to processing of multimedia documents, 
and, in particular, to documents produced in mark-up language. The present invention 
relates to a method and apparatus for parsing documents in mark-up language. The 
invention also relates to a computer program and a computer program product including a 
computer readable medium having recorded thereon said computer program, which is 
configured to make a computer execute a procedure for parsing a document composed in . 
a mark-up language. 

Background Art 

Parsing is a process of extracting information from a document The process 
usually involves at least a minimum check of document syntax, and can in general yield 
either a tree structure description of the document, or a logical chain of events. The 
structural representation based on the logical chain of events is typically produced by an 
ordered parsing of the document from beginning to end. 

Tree-based parsers compile, for example, an XML document into an internal tree 
structure, providing a hierarchical model which applications are able to navigate. The 
Document Object Model (DOM) working group at the World-Wide Web consortium is 
presently developing a standard tree-based Application Programming Interface (API) for 
Extended Markup Language (XML) documents. Event-based parsers, on the other hand, 
report parsing events such as the start and end of elements directly to the application for 
which the parsing is being performed. This reporting is performed typically using 
callbacks, and docs not require an internal tree structure. The application requiring the 
parsing implements handlers to deal with the different events, much like handling events 
In a graphical user interface. 
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Tree-based parsers are useful for a wide range of applications, but typically place 
a strain on system resources, particularly if the document being parsed is large. 
Furthermore, applications sometimes need to build their own particular tree structures, 
and it is inefficient to build a tree representation, only to map it to a different 
representation. Event-based parsers provide a simpler, lower-level access to an XML 
document, facilitating parsing of documents larger than available system memory. The 
"Simple API for XML" (referred to as the SAX parser) is an event-driven interface for 
parsing XML documents. SAX parsers arc discussed in more detail in relation to Figs. 
2(a), 2(b), 3(a), 3(b) and 3(c). 

Figs. 1(a) and 1(b) shows block representations of parser systems. The 
following XML document fragment 106 is considered: 

105 <Shakespeare> 

110 <l~Thts is a comment— > 

115 <div cfass="preface B Namel = n valuer name2=Value2"> 

1 20 <mult list=<> </mult> 

125 <banquo> 

130 Say. m 

135 <quote> 

140 goodnight </quote>, 

145 Hamlet.</banquo> 

150 <Hamlet><quote>Goodnigh1, Hamlet. </quote></Hamler> 

155 </Shakespeare> 

In Fig. 1(b), the XML document 106 is input into a parser 112 which, in the 
present instance, is an event based parser. Optionally, as indicated by a dashed box 108, a 
Document-Type-Definition (DTD) or an XML Schema is also input into the parser 112. 
The parser 112 outputs, as depicted by an arrow 114, a partial structural representation of 
the document 106 which can be a simple list. la Fig. 1(a), a Cascading Style Sheet (CSS) 
or an Extendable Style Sheet (XSL) 104 is input into a CSS or XSL parser 110. A DTD 
102 can also be input into this parser 110. Both the XML parser 112 and the CSS/XSL 
parser 110 areevent driven parsers in the present illustration. , 

One of the benefits of mark-up languages such as XML is the facility to make 
documents smarter, more portable and more powerful, by enabling the use of tags to 
define various parts of the documents. This capability derives from the descriptive nature 
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of XML XML documents can be customised on a pcr-suhject basis, and accordingly, 
customised Lags can be used to make the documents comprehensible, in terms of the 
structure, to a human reader. This very attribute, however, often leads to XML 
documents being verbose and large, and this poses a problem in some instances. For 
example, where XML documents must be parsed in a hardware-constrained piece of 
equipment, such as a printer, the typically memory intensive nature of conventional 
parsing is in conflict with the limited memory which can be accommodated in such 
equipment. Furthermore, the human readability of XML documents is typically of 
minimal benefit when the documents are processed by hardware constrained pieces of 
equipment. Furthermore, tag-string matching operations, which are required to a 
significant degree in XML document parsing, pose a sometimes unacceptable burden of 
processing requirements, translating into an unacceptable number of processor cycles. 
These problems apply to both parser instances shown in Figs. 1(a) and 1(b). 

Disclosure of the Invention 

It is an object of the present invention to substantially overcome, or at least 
ameliorate, one or more disadvantages of existing arrangements. 

According to a first aspect of the invention, there is provided a method of parsing 
a markup language document comprising syntactic elements, said method comprising, for 
one of said syntactic elements, the steps of: 

identifying a type of the element; 

processing the element by determining a hash representation thereof if said type 
is a first type; and 

augmenting an at least partial structural representation of the document using the 
hash representation if said type is said first type. 

According to another aspect of the invention there is provided a method of 
validating a markup language document against a VRD, said method comprising steps of: 

(a) processing the markup language document, for each document tag identified 
therein, if said document tag is not a first document tag in a corresponding markup 
language document tag hierarchy, said processing comprising steps of: 

(i) determining a hierarchy position of said document tag; 
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(ii) determining an extended hashed representation of said document tag 
concatenated with a hashed representation of a previous document tag in the document 
tag hierarchy; and 

(iii) storing said extended hashed representation of said document Lag if 
said document tag is more deeply nested than a previous document tag; 

(b) processing said VRD, for each tag identified therein, if said tag is not a first 
tag in a corresponding tag hierarchy, said processing comprising steps of: 

(i) determining a hierarchy position of said tag; 

(ii) determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) storing said extended hashed representation of said tag in a list; and 

(c) validating said markup language document if said extended hashed 
representation of said document tag is one of found in said list and is a valid subset of a 
member of said list 

According to another aspect of the invention there is provided a method of 
validating a markup language document against a VRD, said method comprising steps of: 

(a) processing said VRD, for each structural element identified therein, said 
processing comprising steps of: 

(i) determining syntactic attributes of said structural clement; 

(ii) determining a hashed representation of said structural element; and 

(iii) storing said hashed representation and syntactic attributes of said 
structural element in a structural representation of said VRD; and 

O) processing the markup language document, for each document structural 
element identified therein, said processing comprising steps of: 

(i) determining syntactic attributes of said document structural element; 

(ii) determining a hashed representation of said Bocument structural 

clement; and 

(iii) storing said hashed representation and syntactic attributes of said 
document structural element in a structural representation of the document; and 
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(c) validating said markup language document if syntactic attributes and hashed 
representations of said each document structural element in the structural representation 
of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 

According to another aspect of the invention there is provided a method of 
encoding a markup language document comprising syntactic elements, said method 
comprising, for one of said syntactic elements, steps of: 

identifying a type of the syntactic clement; and 

processing the syntactic element by one of: 

(i) determining a hash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element. 

According to another aspect of the invention there is provided a method of 
decoding a markup language document comprising encoded syntactic elements, said 
method comprising, for one of said encoded syntactic elements, steps of: 

identifying a type of the encoded syntactic element; 

processing the encoded syntactic element by at least one of: 

(i) determining an inverse hash representation thereof if said type is a 

first type; and 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(in*) retaining the encoded syntactic element 
According to another aspect of the invention there is provided an apparatus for 
parsing a markup language document comprising syntactic elements, said apparatus 
comprising: 

identifyirfg means for identifying a type of the element; 

processing means for processing the clement by detennining a hash 
representation thereof if said type is a first type; and 
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augmenting means for augmenting an at least partial structural representation of 
the document using the hash representation if said type is said first type. 

According to another aspect of the invention there is provided an apparatus for 
validating a markup language document against a VRD, said apparatus comprising: 

(a) means for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document tag hierarchy, said means comprising: 

(i) means for determining a hierarchy position of said document tag; 

(ii) means for determining an extended hashed representation of said 
document tag concatenated with a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(iii) means for storing said extended hashed representation of said ■ 
document tag if said document tag is more deeply nested than an extended hashed 
representation of a previous document tag; 

(b) means for processing said VRD S for each tag identified therein, if said tag is 
not a first tag in a corresponding tag hierarchy „ said means comprising: 

(i) means for determining a hierarchy position of said tag; 

(ii) means for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) means for storing said extended hashed representation of said tag in 

a list; and 

(c) means for establishing whether said extended hashed representation of said 
document tag is one of to be found in said list, and is a valid subset of a member of said 
list, thereby validating said markup language document. 

According to another aspect of the invention there is provided an apparatus for 
;validating a markup language document against a VRD, said apparatus comprising: 

(a) means for processing said VRD, for each structural element identified therein, 
said means comprising: 

CO means for determining syntactic attributes of said structural element; 
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(ii) means for determining a hashed representation of said structural 

element; and 

(iii) means for storing said hashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) means for processing the markup language document, for each document 
structural element identified therein, said means comprising: 

(i) means for determining syntactic attributes of said document 
structural element; 

(ii) means for determining a hashed representation of said document 
structural element; and 

(iii) means for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of Ihe document; and 

(c) means for comparing syntactic attributes and hashed representations of said 
each document structural element in the structural representation of the document to 
corresponding syntactic attributes and hashed representations in said structural 
representation of said VRD to thereby establish validity of the markup language 
document. 

According to another aspect of the invention there is provided an apparatus for 
encoding a markup language document comprising syntactic elements, to form an at least 
partial structural representation of the document, said apparatus comprising: 

means for identifying a type of the syntactic element; and 

means for processing the syntactic element by one of: 

(i) determining a hash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element. 

According to another aspect of the invention there is provided an apparatus for 
decoding a markup language document comprising encoded syntactic elements, said 
apparatus comprising: 

means for identifying a type of the encoded syntactic element; 
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means for processing ihe encoded syntactic element by at least one of: 

(i) determining an inverse hash representation thereof if said type is a 

first type; 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(iii) retaining the encoded syntactic clement. 

According to another aspect of the invention there is provided a computer 
program which is configured to make a computer execute a procedure to parse a markup 
language document comprising syntactic elements, said program comprising: 

code for identifying a type of an element; 

code for processing the clement by determining a hash representation thereof if 
said type is a first type; and 

code for augmenting an at least partial structural representation of the document 
using the hash representation if said type is said first type. 

According to another aspect of the invention there is provided a computer 
program which is configured to make a computer execute a procedure to validate a 
markup language document against a VRD, said program comprising: 

(a) code for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said document tag; 

(ii) code for determining an extended hashed representation of said 
document tag concatenated with a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(iii) code for storing said extended hashed representation of said 
document tag if said tag is more deeply nested than a previous document tag; 

(b) code for processing said VRD, for each tag identified therein, if said tag is 
not a first lag in a corresponding tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said tag; 
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(il) code for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding lag 
hierarchy; and 

(iii) code for storing said extended hashed representation of said tag in a 

list; and 

(c) code for validating said markup language document if said extended hashed 
representation of said document tag is one of found in said list, and is a valid subset of a 
member of said list- 
According to another aspect of the invention there is provided a computer 
program which is configured to make a computer execute a procedure to validate a 
markup language document against a VRD, said program comprising: 

(a) code for processing said VRD, for each structural element identified therein, 
said code comprising: 

(i) code for determining syntactic attributes of said structural element; 

(ii) code for determining a hashed representation of said structural 

element; and 

(iii) code for storing said bashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) code for processing the markup language document, for each document 
structural clement identified therein, said code comprising: 

(i) code for determining syntactic attributes of said document structural 

element; 

(ii) code for determining a hashed representation of said document 
structural element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of the document; and 

(c) code for validating said markup language document if syntactic attributes and 
hashed representations of said each document structural element in the structural 
representation of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 
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According lo another aspect of the invention there is provided a computer 
program which is configured to make a computer execute a procedure to encode a markup 
language document comprising syn tactic elements, said program comprising: 

code for identifying a type of the syntactic element; and 

code for processing the syntactic clement by one of: 

(i) determining a hash representation thereof if said type is a first type; 

(ii) determining a compressed representation thereof if said type is not 

a first type; and 

(iii) retaining the syntactic element. 

According to another aspect of the invention there is provided a computer 
program which is configured to make a computer execute a procedure to decode a markup 
language document comprising encoded syntaqic elements, said program comprising: 

code for identifying a type of the encoded syntactic element; 

code for processing the encoded syntactic element by at least one of: 

(i) determining an inverse hash representation thereof if said type is a 

first type; and 

(ii) determining a decompressed representation thereof if said type is 
not a first type; and 

(iii) retaining the encoded syntactic element. . . * 
According to another aspect of the invention there is provided a computer 

program product including a computer readable medium having recorded thereon a 
computer program which is configured to make a computer execute a procedure to parse a 
markup language document, said program comprising: 
code for identifying a type of the element; 

code for processing the element by determining a hash representation thereof if 
said type is a first type; and 

* code for augmenting an at least partial structural representation of the document 
using the hash representation if said type is said first type. 

According to another aspect of the invention there is provided a computer 
program product including a computer readable medium having recorded thereon a 
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computer program which is configured to make a computer execute a procedure lo 
validate a markup language document against a VRD, said program comprising: 

(a) code for processing the markup language document, for each document tag 
identified therein, if said document tag is not a first document tag in a corresponding 
markup language document lag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said document tag; 

(ii) code for determining an extended hashed representation of said 
document tag concatenated with a hashed representation of a previous document tag in 
the document tag hierarchy; and 

(iii) code for storing said extended hashed representation of said 
document tag if said document tag is more deeply nested than a previous document tag; 

(b) code for processing said VRD, for each tag identified therein, if said tag is 
not a first tag in a corresponding tag hierarchy, said code comprising: 

(i) code for determining a hierarchy position of said tag; 

(ii) code for determining an extended hashed representation of said tag 
concatenated with a hashed representation of a previous tag in the corresponding tag 
hierarchy; and 

(iii) code for storing said extended hashed representation of said tag in a 

list; and 

(c) code for validating said markup language document if said extended hashed 
representation of said document tag is one of found in said list and is a valid subset of a 
member of said list 

According to another aspect of the invention there is provided a computer 
program product including a computer readable medium having recorded thereon a 
computer program which is configured Lo make a computer execute a procedure to 
validate a markup language document against a VRD, said program comprising: 

(a) code for processing said VRD, for eacfi structural element identified therein, 
said code comprising: 

(i) code for determining syntactic attributes of said structural element; 



(62) 



2002-99428 



(ii) code for determining a hashed representation of said structural 

element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said structural element in a structural representation of said VRD; and 

(b) code for processing the markup language document, for each document 
structural element identified therein, said code comprising: 

(i) code for determining syntactic attributes of said document structural 

clement; 

(ii) code for determining a hashed representation of said document 
structural element; and 

(iii) code for storing said hashed representation and syntactic attributes 
of said document structural element in a structural representation of the document; and 

(c) code for validating said markup language document if syntactic attributes and 
hashed representations of said each document structural element in the structural 
representation of the document conforms to corresponding syntactic attributes and hashed 
representations in said structural representation of said VRD. 

According to another aspect of the invention there is provided an at least partial 
structural representation a markup language document comprising syntactic elements, 
said at least partial representation having been produced by a method comprising, for one 
of said syntactic elements, the steps of: 

identifying a type of the clement; 

processing the element by determining a hash representation thereof if said type 
is a first type; and 

augmenting an at least partial structural representation of the document using the 
hash representation if said type is said first type. 

According to another aspect of the invention there is provided an apparatus for 
parsing a markup language document comprising syntactic elements, said apparatus 
comprising: 

a processor; 



(63) 



2002-99428 



a memory for storing (i) ihc document, and (ii) a program which is configured to 
make the processor execute a procedure to parse the document; 
said program comprising: 

(i) code For identifying a type of an clement; 

(ii) code for processing the element by determining a hash 
representation thereof if said type is a first type; and 

(iii) code for augmenting an at least partial structural representation of 
the document using the hash representation if said type is said first type. 

According to another aspect of the invention there is provided an apparatus for 
validating a markup language document comprising syntactic elements against a VRD 
comprising syntactic elements, said apparatus comprising: 

(a) a processor; 

(b) a memory for storing (i) the document, (ii) said VRD, and (iii) a program 
which is configured to make the processor execute a procedure to validate the document; 

(c) said program comprising: 

(ca) code for processing the markup language document, for each 
document tag identified therein, if said document tag is not a first document tag in a 
corresponding markup language document tag hierarchy, said code comprising: 

(caa) code for determining a hierarchy position of said 

document tag; 

(cab) code for determining an extended hashed representation 
of said document tag concatenated with a hashed representation of a previous document 
tag in the document tag hierarchy; and 

(cac) code for storing said extended hashed representation of 
said document tag if said document tag is more deeply nested than a previous document 

tag; 

(cb) code for processing said VRD, for each tag identified therein, if 
said tag is not a first tag in a corresponding tag hierarchy, said means comprising: 

(cba) code for determining a hierarchy position of said lag; 
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(ebb) code for determining an extended hashed representation 
of said lag concatenated with a hashed representation of a previous tag in the 
corresponding lag hierarchy; and 

(cbc) code for storing said extended hashed representation of 

said tag in a list; and 

(cc) code for establishing whether said extended hashed representation 
of said document tag is one of to be found in said list, and is a valid subset of a member 
of said list, thereby validating said markup language document. 

According to another aspect of the invention there is provided an apparatus for 
validating a markup language document containing syntactic elements against a VRD 
containing syntactic elements, said apparatus comprising: 

(a) a processor; 

(b) a memory for storing (i) the document, (ii) said VRD, and (iii) a program 
which is configured to make the processor execute a procedure to validate the document; 

(c) said program comprising: 

(ca) code for processing said VRD, for each structural element 
identified therein, said code comprising: 

(caa) code for determining syntactic attributes of said structural 

clement; 

(cab) code for determining a hashed representation of said 

structural element; and 

(cac) code for storing said hashed representation and syntactic 
attributes of said structural element in a structural representation of said VRD; and 

(cb) code for processing the markup language document, for each 
document structural element identified therein, said code comprising: 

(caa) code for determining syntactic attributes of said 
document structural clement; 

(cab) code for determining a hashed representation of said 
document structural element; and 
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(cac) code for storing said hashed representation and syntactic 
attributes of said document structural element in a structural representation of the 
document; and 

(cc) code for comparing syntactic attributes and hashed representations 
of said each document structural clement in the structural representation of the document 
to corresponding syntactic attributes and hashed representations in said structural 
representation of said VRD to thereby establish validity of the markup language 
document. 

According to another aspect of the invention there is provided a method of 
validating a markup language document against a VRD, said method comprising steps of: 

determining first extended bashed representation^) for most deeply nested 
syntactic element(s) of a first type in the VRD; 

storing said first extended hashed representation^) in a VRD list; 

determining a second extended hashed representation for a most deeply nested 
syntactic element of the first type in the markup language document; and 

declaring said markup language document to not be invalid if said second 
extended hashed representation is present in the VRD list. 

According to another aspect of the invention there is provided an apparatus for 
validating a markup language document against a VRD, said apparatus comprising: 

means for determining first extended hashed representation^) for most deeply 
nested syntactic elements) of a first type in the VRD; 

means for storing said first extended hashed representations) in a VRD list; 

means for determining a second extended hashed representation for a roost 
deeply nested syntactic element of the first type in the markup language document; and 

means for declaring said markup language document to not be invalid if said 
second extended hashed representation is present in the VRD list. 

According to another aspect of the' invention there is provided a computer 
program which is configured to make a computer execute a procedure to validate a 
markup language document against a VRD, said program comprising: 
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code for determining first extended bashed 'representations) for roost deeply 
nested syntactic elemenl(s) of a first type in the VRD; 

code for storing said first extended hashed represcntation(s) in a VRD List; 

code for determining a second extended hashed representation for a most deeply 
nested syntactic element of the first type in the markup language document; and 

code for declaring said markup language document to not be invalid if said 
second extended hashed representation is present in the VRD list. 

According to another aspect of the invention there is provided a computer 
program product including a computer readable medium having recorded thereon a 
computer program which is configured to make a computer execute a procedure to 
validate a markup language document against a VRD, said program comprising: 

code for determining first extended hashed representations) for most deeply 
nested syntactic element(s) of a first type in the VRD; 

code for storing said first extended hashed representation(s) in a VRD list; 

code for determining a second extended hashed representation for a most deeply 
nested syntactic element of the first type in the markup language document; and 

code for declaring said markup language document to not be invalid if said 
second extended hashed representation is present in the VRD list. 

Detailed Description including Best Mode 

Where reference is made in any one or more of the accompanying drawings to 
steps and/or features, which have the same reference numerals, those steps and/or features 
have for the purposes of this description the same function(s) or operation(s), unless the 
contrary intention appears. 

The inventive concept disclosed in this specification is based on the idea that 
memory requirements of an XML parser can be reduced, and various performance metrics 
can be improved, by performing a "perfect" hash of the XML tags, and possibly other 
elements within an XML file. A hash function is a function, mathematical or otherwise, 
that takes an input string, and converts it to an output code number called a hash value. A 
perfect hash function is one which creates a unique code number for a unique input string 
within a preset domain. The input string can be composed, for example, of alpha numeric 
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characters, or other characters approved by the World Wide Web Consortium, and must 
be less than a certain length dictated by the hash process specifics. Alternatively, or in 
addition, the input string can be constrained in other ways, for example in terms of a 
probability of code number collision based on input context This idea allows an arbitrary 
XML tag to be treated as a numeral or code, which can be stored in numeric form in 
memory. Since a parser normally preserves some portion of an XML structure in 
memory as the structure is parsed, conversion of XML tags to unique numerals allows 
memory requirements to be reduced, and furthermore, allows string-to-string comparisons 
to be replaced with equivalent, but much Faster numerical comparisons. 

The principles of the arrangements described herein have general applicability to 
parsing documents using a wide variety of mark-up languages. For ease of explanation, 
the disclosed arrangements are described with reference to the XML language. This is 
not intended, however, to limit the scope of the inventive concept. For example, the 
disclosed arrangements can also be applied to a UTF-16 transformation format (see 
International Standard ISO/1EC 10646-1 for further details of UTF-16). 

Figs. 2(a) and 2(b) depict a prior art SAX parser process 236, which supports 
optional well-formedness and/or validation checking sub-processes. 

In Fig. 2(a), a mark-up document, in the present case an XML document, is 
opened in an initial step 200. Thereafter, a decision step 202 tests whether the document 
contains any unprocessed (ie unparsed) characters, and if this is the case, a character is 
read and stored in a string in a following step 204. If further characters are, however, not 
delected in the testing step 202, the parsing process 236 terminates in a step 234. 

Following the step 204, a check is performed in a testing step 206 to determine 
whether a complete syntactic element has yet been assembled, and if so, the parser 
process 236 proceeds to a "Syntactic Type" identification step 210. If, on the other hand, 
a complete syntactic element has not yet been assembled, the parser process 236 is 
" directed to a decision block 208 which determines if any further characters are available 
in the document. If additional characters are available, the parser process 236 is directed 
according to a "yes" arrow back to the step 204. Alternatively, if no more characters are 
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available, then the process 236 is directed in accordance with a "no" arrow to the 
syntactic clement type identification step 210. 

The "type identification" step 210 identifies a "type" for the assembled syntactic 
element, after which the element string is placed, in a step 212, into a memory 
representation of the document structure, thereby augmenting the representation as it has 
been assembled to this point. The memory representation of the document structure, 
which is typically, in the case of event driven parsers, a partial structural representation of 
the document, can be a simple list. 

After the step 212, the process 236 is directed to a testing step 242, which 
determines whether a well-formedness check is to be performed. Well-formedness 
checks ensure that the document meets appropriate "well-formedness constraints", as 
defined on page 5 of "Extensible Markup Language (XML) 1.0 (Second Edition) W3C 
Recommendation, 6 October 2000", which is available on the Internet at 
http:\\www.w3.org\tr\2000\rec-xml-20001006.html. Well-formedness checks test the 
document for compliance with general structure rules, particularly whether tags in a 
document have been properly nested. If such a check is to be performed, then the process 
236 is directed in accordance with a "yes" arrow to "a" on a dashed boundary line 246. 
The dashed boundary line 246, along with reference letters "a" to "d" is mirrored by a 
corresponding boundary line in Fig. 2(b), in relation to which the process 236 is further 
described. If the well-formedness check is not to be performed, then the process 236 is 
directed in accordance with a "no" arrow from the testing step 242 to a testing step 244 
which determines whether a "validation check" is to be performed. Validation checks 
involve a comparison of syntactic elements in a document against validity constraints 
defined in a Validation Reference Document (referred to as a VRD for the sake of 
brevity) such as a document type definition (DTD), as described in Section 5.1 of the 
aforementioned W3C Recommendation. DTDs and XML Schemas are examples of 
VRDs against which validation checks can be performed; however validation checks as 
described herein can be performed against other types of VRDs. This comparison 
procedure verifies correct syntactic placement of elements to a greater extent than the 
mere well-formedness check. If the validation check is to be performed, then the process 
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236 is directed in accordance with a "y cs " arrow to "b" on the dashed boundary line 246. 
If, on the other hand, the validation check is not to be performed, then the process 236 is 
directed in accordance with a "no" arrow to "c" on the dashed boundary fine 246. 

If the well-formedness check is elected, then the process 236 is directed from "a" 
on the boundary line 246 to an optional sub-process 238, and in particular to a well- 
formedness checking step 214 found therein. The optional nature of the process 238 is 
denoted by the dashed rectangle outline thereof. If the validity check is elected, then the 
process 236 is directed from "b" on the boundary line 246 to an optional sub-process 240, 
and in particular to a validity checking step 220 found therein. The optional nature of the 
process 240 is denoted by the dashed rectangle outline thereof. If the validity check is not 
elected, then the process 236 is directed from "c" on the boundary line 246 to an action 
selection step 226. 

If the well-formedness check is elected, then after the well-formedness step 214, 
if an error is detected in the following error checking step 216, corrective action and/or 
error indication takes place as indicated by an arrow 218. If, on the other hand, no errors 
are detected, then the parser process 236 is directed from the step 216 to the sub-process 
240, in which the validation check is performed in the step 220. As noted, the parsing 
processing 236 can be directed to the validation checking step 220 either from the error 
checking step 216, or alternatively, the well-formedness checking sub-process 238 can be 
by-passed, and the process 236 can be directed directly to the validation checking step 
220 from "b" on the boundary line 246. The optional well-formedness sub-process 238 
can be bypassed if the appropriate decisions are made in the testing steps 242 and 244 
(sec Fig. 2(a)). 

As noted, the validation checking step 220 involves a comparison of the 
identified syntactic element in the markup document being considered against a document 
type deDnition (DTD). This comparison procedure verifies correct syntactic placement of 
elements to a greater extent than the mere well-formedness check described irl relation to 
the sub-process 238. 

Following the validation step 220, if an error is detected in an error checking step 
222, corrective action is taken, andAJr an error indication is produced, as depicted by an 
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arrow 224. Alternatively, if no error is delected, the parser process 236 proceeds to the 
action selection step 226, where an action is selected based upon the type of the syntactic 
element being considered. The optional sub-processes 238 and 240 can both be bypassed, 
if the appropriate decisions are made at the decision steps 242 and 244 in Fig. 2(a). If 
both of the aforementioned sub-processes are bypassed, then as noted the parsing process 
236 is directed from "c" on the boundary line 246 directly to the action selection step 226. 

If the syntactic element is a tag, then as depicted by an arrow 228 the tag value, 
or a representative string, is sent to the application in respect of which the parsing process 
is being performed, and a memory representation of the tag is maintained. If, on the other 
hand, the element type is a non-tag type, then as depicted by an arrow 230, the element 
value string is sent to the associated application, and the memory representation of the 
element is deleted. Finally, the parsing process 236 is directed, as depicted by an arrow 
232, to 4, d* on the dashed boundary line in Fig. 2(b), and from "d" on the corresponding 
dashed boundary line 246 in Fig. 2(a) to the character testing step 202. 

Significant memory requirements arise from the verbose nature of the XML 
document, resulting in correspondingly significant memory requirements to store the 
document structure in its original string form. This document structure is referred to in 
the step 212. Furthermore, an associated significant processing load, relating to 
performance of string comparisons between variable length alpha-numeric strings, arises 
both in the well-formedness checking step 214, and in the validation checking step 220. 

A partial memory representation of the document must typically be stored, and 
string checking must typically be performed, both (i) in relation to the step 214 in regard 
to checking for cLosure of hierarchy branches, namely matching end tags to start tags, and 
also for checking for non-overlapping branches, and (ii) in relation to the step 220, in 
which similar processes are required as in (i), as well as checking conformity of structure 
and tag names against the DTD. 

" A parser must normally preserve some porti6n of an XML structure in memory 
as the XML structure is parsed. Even for a SAX parser, a local portion of the XML 
structure must be retained in memory for correct operation. If however each XML tag is 
converted to a unique numeral using a hash function, memory requirements are typically 
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reduced, since the numeral resulting from the hash operation is smaller than the 
associated arbitrary -length XML tag string. Furthermore, string-to-string comparisons, 
required for matching beginning & end tags, can be replaced with much faster numerical 
comparisons, thereby reducing the processing load. 

Typical hash algorithms include (i) Cyclic Redundancy Coding (CRC) 
algorithms (commonly used for signature analysis or error-detection/correction in data 
transfer & storage), (ii) fully lossless encoding algorithms, and (iii) Huffman encoding 
algorithms. 

Typically, a suitable hash algorithm must be static in its operation, or in other 
words it must always return the same hash result for the identical input conditions over 
the required set of data. The required set of data can, however, vary according to the 
circumstance. The data set can thus typically comprise at least an entire markup 
document, but can also include a relevant DTD or XML Schema, linked markup 
documents, and related or linked markup documents in different languages, eg a CSS 
document referenced by an XML document. A static hash algorithm can, however, be 
used where necessary by resetting the algorithm whenever tag syntax is encountered, for 
example whenever the non-literal character is found in an XML document. The hash 
algoriLhm can also be reset where an <!ELEMENT string is found in an XML DTD 
document, or wherever a valid tag selector is permitted in a CSS document. 

A reference in an input markup document can be used to signal, or to select a 
suitable hash algorithm. This can be done in much the same way as markup documents 
can reference other markup documents, DTDs, stylesheets, character encodings, 
namespaces, and so on. For example a particular hash algorithm can be identified with a 
particular namespace, thereby permitting indirect reference to a hash algorithm via a 
namespace reference within a document. A hash algorithm implementation can be wholly, 
or partially included within a markup document, along with associated parameterisation. 
Such methods of referencing or including hash algorithms can be useful for optimisation 
purposes, where different hash methods have been optimised for use with particular 
markup documents, thereby improving performance and memory usage in destination 
devices or systems. Alternatively, the aforementioned referencing methods can be useful 
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for matching purposes. This "refers to applications involving one or more markup 
documents, where error checking or completion of parsing or other functions are required, 
and where one or more other documents (e.g. a DTD) have already been hashed by the 
same algorithm. 

Further refinements arc possible in the above approach, for example involving 
optional hashing of DTDs. This reduces Read Only Memory (ROM) requirements for 
storing DTDs, and provides for faster validation processing of XML documents, by 
allowing comparison of numerical values rather than (slower) string comparisons. 

Figs. 3(a), 3(b) and 3(c) illustrate one arrangement of an improved SAX parser 
process 344. In Fig. 3(a), steps 300, to 310 are identical to corresponding steps 200 to 
210 which have been described in relation to Fig. 2(a). After the step 310, the assembled 
syntactic element is tested to ascertain its nature as a tag, or another element type, in a 
testing step 312. If the element is a tag, the parsing process 344 is directed to a hash step 
318 by an arrow 316. The hash step 318 determines, using respective processors 414 or 
505 in Figs. 5 and 6, a unique numeric representation of the syntactic clement. This 
results in a more memory efficient representation of the element, which also lends itself to 
simpler and faster comparison operations in the numeric, rather than the alpha-numeric 
domain. Both the element suing depicting the syntactic element, and the hash value 
thereof, arc retained at this point of the process 344, however it is the hash value, and not 
the string value, which is inserted, in the step 318, into the memory representation of the 
document structure using respective memories 418 and 506 (see Figs. 5 and 6). 

In order to better appreciate operation of the parsing process 344 as described in 
relation to Figs. 3(a), 3(b) and 3(c), parsing of the exemplary XML fragment [1] is 
considered firstly in relation to the parsing process 236 described in relation to Fig. 2. In 
this case, the XML fragment [1] yields the following hierarchical representation of parsed 
mark-up tags in the sub-process 212: 
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205 Shakespeare 



215 div 

220 mutt 

221 /mult 
225 banquo 

235 quote 

240 quote [2] 

245 /banquo 

250 Hamlet 

251 quote 

252 /quote 

253 /Hamlet 



255 /Shakespeare 



In contrast, the differentiated treatment of tag elements and non-tag elements in 
the parsing process 344, as described in relation to Fig. 3(a), results in an equivalent 
hierarchical representation being generated by the step 318. The equivalent hierarchical 
representation is depicted in [3]. The hierarchical representation in [3] is made up of 
parsed hashed mark-up tags. For the sake of Lhis example, a domain of tag names is 
constrained to those shown in the following Table 1, and a hash mapping (which is 
functionally equivalent to application of a hash "function") is shown in the following 
table: 



Tag 


Hash Code Number 


Shakespeare 


133 


Div 


326 


Mult 


371 


Banquo 


787 


Quote 


629 


Hamlet 


411 



Table 1. Hash Mapping 
Based on the above hash mapping, the following hierarchical representation of 
the XML fragment shown in [1 J results: 
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205 133 



215 
222 
223 
225 
235 
240 
245 
254 
255 
256 
257 



/787 
411 



326 
371 
/371 

787 



/411 



629 
/629 



629 
/629 



P] 



255 /133 



Returning to Fig. 3(a), the parsing process 344 is directed from the step 314 to 
"a" on a dashed boundary line 356. The dashed boundary line 356, along with reference 
letters "a" and "b" is mirrored by a corresponding boundary line in Fig. 3(b), in relation to 
which the process 344 is further described. 

Turning to Fig. 3(b). the process 344 continues from "a" on the dashed boundary 
line 356 to a testing step 350 which determines whether a well-formedness check is to be 
performed. If such a check is to be performed, then the process 344 is directed in 
accordance with a "yes" arrow to "c" on the boundary line 358. The dashed boundary 
line 358, along with reference letters "c" to "f ' is mirrored by a corresponding boundary 
line in Fig. 3(c), in relation to which the process 344 is further described. If the well- 
formedness check is not to be performed, then the process 344 is directed in accordance 
with a "no" arrow to a testing step 352 which determines whether a validation check is to 
be performed. If the validation check is to be performed, then the process 344 is directed 
in accordance with a "yes" arrow to "e" on the dashed boundary line 358. If, on the other 
band, the validation check is not to be performed, then the process 344 is directed to M" 
on the dashed boundary line 358. 

Turning to Fig. 3(c) if the u/ell-formedness check is to be performed, then the 
process 344 is directed from K c" on the dashed boundary line 358 to a well-formedness 
checking step 320. If, on the other hand, the well-formedness check is not elected, then 
the process 344 is directed from "e" on the dashed boundary line 358 to a validation 
checking step 326. [f neither a well-formedness check or validation check is elected, then 
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the process 344 is directed from u d n on the dashed boundary line 358 to an action 
selection step 334. 

The well-formedness checking step 320 performs well-formedness checking 
using respective processors 414 or 505 and forms part of an optional process 346. The 
optional nature of the process 346 is depicted by use of dashed lines. Similarly, the 
validation step 326 forms part of an optional sub-process 348, the optional nature thereof 
being depicted by use of dashed lines. 

It is apparent that the hierarchical representation depicted in [3] allows string 
comparisons to be replaced by faster and more efficient numerical comparisons, thereby 
reducing the associated computational burden. Furthermore, the hierarchical 
representation shown in [3] is a more memory-efficient representation, than that shown in 
[1] and accordingly the representation shown in [3] is more suited to memory-constrained 
applications as previously discussed. 

Returning lo Fig. 3(c), if well-formedness checking is elected, then after well- 
formedness checking is performed in the step 320, the parsing process 344 is directed to 
an error checking step 322, whereupon if an error is detected, as depicted by an arrow 324, 
corrective action is taken, anaVor an error is indicated. The well-formedness check 
typically considers whether tags in a document have been properly nested. Thus, for 
example, having reference to [2] the tag pair "Hamlet" and "/Hamlet** are properly nested 
within the tag pair "Shakespeare" and "/Shakespeare" since the "Hamlet" tag pair is fully 
nested within the "Shakespeare" tag pair, and the tag pairs do not, for example, overlap 
each other. 

I£ on the other hand, no error is detected, the parsing process 344 is directed to 
the optional process 348, in which the validation checking step 326, using respective 
processors 414 or 505, is performed with reference to a DTD or an XML Schema. As 
noted, validation checking is a more detailed form of checking than well-formedness 
checking. Thus, fox example, whereas the well-formedness check considers whether the 
"Hamlet* * tag pair is properly nested within the "Shakespeare" tag pair, validity checking, 
in contrast, both checks for proper nesting in the sense that the "Hamlet" tag pair is fully 
nested within the "Shakespeare" tag pair, but also checks whether "Hamlet" tag pairs may 
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legally be nested in Ihis way. There may. for example, be a situation where, in faci, 
"Shakespeare" tag pairs must be nested within "Hamlet" tag pairs, rather than the other 
way around. Thus, the validity checking process checks hierarchical relationships of tags, 
in this case being whether "Hamlet" tag pairs may be nested within "Shakespeare" tag 
pairs, as well as considering whether nesting has been properly, namely completely, 
performed. 

In order to perform the validation step 326, DTD or XML Schema tags are first 
hashed in a hashing step 328, in order to bring the DTD/XML Schema memory 
representation into conformity with the hashed nature of the mark-up document which has 
been generated by the hash step 318. The validation checking step 326 compares the 
mark-up document structural representation generated in the step 318 to the structural 
representation of the DTD/XML Schema generated in the step 328, to verify correct 
Syntactic placement of syntactic elements in the markup document, noting that the string 
comparisons required for this comparison as used in step 220 in relation to Fig. 2, are now 
replaced, in Fig. 3(c), by faster and more efficient numerical comparisons, as a result of 
the hashing operations in steps 328 and 318. 

After validation, the process 344 is directed to an error checking step 330, in 
which corrective action and/or error indication is performed as indicated by an arrow 332. 
If no errors are detected, the parsing process 344 proceeds to an action selection step 334, 
whereupon if the syntactic element is a tag type, the corresponding tag string is sent to the 
application in respect of which the parsing process is being performed, and the tag string 
itself is deleted from memory, this being either 418 or 512 in Figs. 5 and 6 respectively. 
The associated hashed tag memory representation is, however, retained. Accordingly, no 
string-based memory representation of the tag is retained, other man one copy of the 
currently parsed tag string. The memory representation of the tag is thus only in hashed 
form. If the element syntactic type is either a non-tag type, or a non-tag name type, then 
as depicted by an arrow 338, the value of the element, or a string representation thereof is 
sent to the associated application, and the associated memory representation is deleted. 
The parsing process 344 now loops back, as indicated by an arrow 340, to T on the 
dashed boundary line 358, and thereafter to the corresponding T on the dashed boundary 
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line 358 in Fig. 3(b), and thereafter to "b" on the dashed boundary line 356, and thereafter 
to "b" on the dashed boundary line 356 in Fig. 3(a), and finaLly to the character testing 
step 302. If no further characters are detected, the parsing process 344 terminates in a 
step 342. 

The XML document fragment [1], with tags in hashed form, has the following 

form: 

505 <133> 

110 <!-This is a comment-> 

515 <326 class="preface" Name 1 =" valuer name2="value2"> 
520 <371 !tet=&tt> </371> 
525 <787> 

130 Say [4] 
535 <629> 

540 goodnight </629>, 
545 Harnlet.</787> 

550 <41 1><629>Goodnight, Hamlet. </629><^41 1> 
555 </133> 

The representation of closing tags (which typically use syntax: </section> as 
opposed to start tags which use syntax <section>) can be defined in various ways, thereby 
attaining more, or less, compatibility with the XML standard. It is noted that start tags 
and end tags are considered, in the present description, to be "equivalent types'*. 
Furthermore, the fact that the start and end tag perform a collective function, namely 
delimiting sections of document content, is taken to mean that there is a relationship 
between the two tags. It is further noted that the aforementioned syntax for start and end 
tags means that the end tag is a modification of the start tag, wherein a distinguishing 
character, namely a T is incorporated into the start tag in order to produce a 
corresponding end tag. Compatibility with the XML standard can be more important in 
some instances than in others. In the preferred embodiment, the V character of a current 
tag string is typically removed prior to hashing the following tag name, in order that 
identical start and end tag names return the same numeral from the hashing function. An 
XML tag is exemplified by </Name Altributo, as can be seen in Section 3.1 of 
"Extensible Markup Language (XML) 1.0 (Second Edition) W3C Recommendation, 6 
October 2000", which is available on the Internet al http;\\www.w3.org\tr\2000\re<>xml- 
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200010Q6.html). In the present description, the term "tag" can refer, depending on the 
context, to either a part of, or the entirety of, the particular tag being considered. 
Alternatively, there may be situations where it is desired to retain an equivalent 
representation of the V character (identifying the end tag) in memory. This can be done 
in a variety of ways, such as: (i) reinserting the V character or an equivalent character 
into memory in proximity to the end tag hash numeral so as to indicate that it is an end 
tag, (ii) using a boolean value to indicate the end or start state of a hashed tag, or (iii) 
negating the end tag hashed value so that a simple addition of start and end tags yields 
zero for a perfect match. In Option (iii), the hashed start tag has been modified by an 
operator, in the present case a simple negation operation, in order to produce the requisite 
hashed end tag. Option (iii) requires that a sign bit be guaranteed to be free from 
influence by the hashing algorithm. This option is, in fact, very similar. to the boolean 
flag option (ii). 

Furthermore, structured hash numbers can be generated in which a hash number 
for a nested tag can explicitly indicate the higher-level XML tags within which the first 
tag is nested. Thus, for example, where tag 1 23 is nested inside tag 987, then instead of 
being designated as nested tag 123, it can be designated as 987.123. This structured, or 
"extended", hashing can allow further parsing performance improvements by reducing 
structure-spanning operations, ic by reducing an amount of the XML document which 
must be held in memory while the end-points of a tag pair are being searched for. 

It is also noted that extended representations need not be based upon hashing, but 
can also be based upon strings, or "enumeration", which is a process whereby a mapping 
is defined between tag names and numerals, thereby creating an enumeration table or 
index. A simple form of enumeration is to merely list all the tag names, and to number 
the listed tags. Thus, for example, a concatenated string of the form 
"Shakespeare. banquo. quote" represents a string-based extended representation of three 
concatenated tags. 

A structured equivalent hashed markup example for the XML fragment [3] is 
presented in [5] below using negated, hashed end tags. 
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133 

133.326 
133.371 
133.-371 
133.787 

133.787.629 -> 013307870629 

133.787.-629 -> -013307870629 

133.-787 [5] 

133.411 

133.411.629 

133.411.-629 

133.-41 1 

-133 



In [5], the structure of nested tags is converted from the form shown in [3] into a 
series of concatenated hashed tags, in which each subsequent lower (ie. more deeply 
nested) hierarchical level of hashed tag is directly linked to its previous upper hierarchical 
levels. This allows simple numerical comparison to be performed with a similarly parsed 
structure from a hashed DTD. In fact, each tine in [5] is represented, as shown in [5] for 
lines 4 and 5, by a single numeral which is combined by concatenation of the set of 
bashed tags encountered. This single numeral represents in a very compressed form both 
the identities and relationships of the original input tags, and accordingly enables a very 
efficient comparison method with a similarly hashed DTD. It can be seen that the 
numerical tag sets can be used to represent the document structure in a highly compressed 
form. A validation check can be performed using merely the hashed start tag sets, noting 
that each such set represents the deepest, and entire, structure of each branch of the 
document structure. For instance, the structure of [5] can be minimally represented in [6] 
as follows: 

01330326 

01330371 [6] 

013307870629 

013304110629 



A DTD or XML schema structure can also be represented by the same method. 

A single, or multiple set of numerical comparisons between a lag set from the 
parsed & hashed input document and a tag set from the parsed & hashed DTD replaces a 
scries of string and structure comparisons normally required in XML parser validation. It 
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can be recognized that any alternate valid structures defined by a DTD or XML schema 
can be encoded into unique hashed tag set numerals for later comparison with hashed tag 
set numerals generated from an input XML document. 

Fig. 4 depicts a process 600 for validating a mark-up document against a VRD 
such as, for example, a DTD, or an XML schema. The process 600 commences with a 
step 602 in which a markup document to be validated is opened. Therafter, in a step 604, 
a current extended tag is reset by a respective processor 414 or 505 in Fig. 5 or Fig. 6. In 
the description relating to Fig. 4, the terms "tag", "extended tag", "temporary tag" and so 
on refer to the hashed representations of the respective tags. In a following step 606, a 
temporary tag root is reset by one of the respective processors 414 or 505 7 after which a 
next tag in the markup document is identified in a step 608. Thereafter, a testing step 610 
determines whether the tag identified in the step 608 is a start tag, in which event the 
process 600 is directed in accordance with a "yes" arrow to a step 612 which adds the tag 
identified in the step 608 to the extended tag using one of the respective processors 414 or 
505. The process 600 is then directed from the step 612 back to the step 608. 

If the testing step 610 determines that the next tag is not a start tag, then the 
process 600 is directed in accordance with a "no" arrow to a testing step 614, which 
determined whether the extended tag = "0", which represents the root level of the 
document. Tf the "0" value is detected, then the process 600 is directed in accordance 
with a "yes" arrow to a testing step 624 which determines whether the end of the 
document has been reached. If this is not the case, then the process is directed in 
accordance with a "no" arrow back to the step 606. It is noted that detection of a "0" 
value in the step 614 may also result from an document structure which is not well 
formed, such as would be the case for a structure having a mismatched number of start 
and end tags. 

If the testing step 614 determines that the extended tag value is not equal to "0", 
then the process is directed in accordance with a "no" arrow to a testing step 616, which 
determines whether the extended tag is equal to the temp tag root value. If this is found 
not lo be the case, then the process 600 is directed in accordance with a "no" arrow to a 
step 618 which stores the extended tag in a document list in a respective memory 418 or 
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506. If on the other hand, the testing step 616 determines that the extended tag is equal to 
the temp tag root, then the process 600 is directed in accordance with a "yes" arrow to a 
step 620 which removes the lowesl (namely the most deeply nested) tag from the 
extended tag, using a respective processor 414 or 505. Thereafter, in a step 622, the 
process 600 copies the extended tag to the temp lag root, after which the process is 
directed back to the step 608. 

Prior to returning to the testing step 624, it is noted that the process 600 as 
heretofore described is directed to the markup document whose validity is being checked. 
There is also, however, an identical process, not explicitly described, which is applied to 
the validation reference document (VRD) to thereby produce a VRD list against which 
tbe document list produced by the process 600 can be tested. The process 600 and the 
equivalent process directed to the VRD typically occur at different times. The process 
600 occurs for every document being validated, and produces a list of extended hash 
representations for each particular document being validated. The VRD list can be 
produced substantially concurrently with the process 600, providing that the VRD list is 
completed prior to the step 626. Alternately, the VRD process can be performed off line, 
and the resultant list provided to the process 600 prior to the step 626. 

Returning to the step 626, and since the VRD list is available as noted, the step 
626 determines whether every entry in the document list is to be found in the VRD list. IT 
this is the case, then the process 600 is directed in accordance with a "yes" arrow to a step 
628 which declares that the document is not detected as invalid. If on the other hand, the 
document list has an entry which is not to be found in the VRD list, then the process 600 
is directed in accordance with a "no" arrow to a step 630 which declares that the 
document is invalid. 

The above description compares, as described in more detail in regard to the step 
626, all the document list entries with all the VRD list entries. An alternate process is to 
test each extended tag, after the step 616, agairist the complete VRD list in a* step similar 
to the step 626, in which event if the document extended tag is not to be found in the 
VRD list, the process 600 can.proceed directly to the step 630, saving unnecessary further 
testing. In the event that the extended tag is however to be found, then the process 600 
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can be directed to the step 620 and so on. The alternate arrangement provides earlier 
recognition of an error, and immediately aborts the validation process, which provides 
added efficiency provided the VRD list is relatively short. If a complete validation check 
is implemented with the above method, then the step 62B indicates that the document 
being considered is valid. 

In order to further illustrate the validation method described, the following 
structure fragment is considered, in which start tags are "01" to "05", and the 
corresponding end tags are "-01" to "-05" respectively. 



01 

02 

03 
-03 

04 [7] 
-04 

-02 
05 
-05 



In functional tennis, the process 600 traverses down to the deepest part of a 
branch in the hierarchical structure of the mark-up document, namely from "01" on the 
first line of [7] to "03" on the third line of (7], and stores an extended hash representation 
for the deepest part of that particular branch, namely "010203". The process then 
traverses up the branch, discarding end tags, until it finds another start tag which indicates 
a new branch to pursue, which is "04" on line 5 of [7] in this example. As the process 
traverses down the new branch, the process preserves the extended hash representations 
of higher levels of the hierarchy, until it has stepped back above those levels. An error in 
the document structure, resulting in an invalid document or a document which is not well 
formed, will typically return extended hash representations that do not match those of the 
VRD. The step 620 may optionally include well-formedness checks of the retrieved end 
tag against the previous start lag, thereby providing a well-formedness match if the 
document is well formed. It is noted that the previous start tag is the lowest tag in the 
extended hash representation. For example the DTD/XML Schema may return end tags 
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at the testing step 610 that do not match the lowest start tag in the extended hashed 
representation in the step 620, thereby failing a well-formedness check. 

The lest at the step 626 typically seeks to match the extended hashed 
representations of the mark-up document structure against those hashed representations 
listed for the DTD of that document. The hashed representations of the document 
structure will typically be a subset of the deepest structural representations from the DTD 
list. Accordingly, a valid XML document is permitted to contain any legal subset of the 
structural nesting defined in the corresponding VRD, or DTD. Therefore, a typical test in 
the step 626 includes comparisons of shallower hashed structural representations of the 
document against a deeper hashed representation of the DTD. Thus, for example, an 
extended hashed representation "0123" from the XML document would be assessed as 
"valid" .when compared to a hashed representation "01230456" from the corresponding 
DTD. 

The validation process 600 shown in Fig. 4 can be optimized to check the more 
complex parts of a document structure, such as the most deeply nested portions, in a fast 
but incomplete check of validity. Thus, an optimal combination of speed and validity 
sensitivity can be selected, in order to implement a particular validating parser having 
arbitrary performance characteristics. 

The validation method 600 can also be modified to perform at least portions of 
"standard" well-formedness checks. Thus, for example, in the step 620, the hashed 
representation of the end tag can be checked against the lowest hierarchical hashed tag 
representation within the extended tag representation. If the aforementioned 
representations do not properly resolve to the same original tag identity, then the 
document is not well-formed, and a recovery, or error action can be performed. 

The above method can be extended to include hashed representations of defined 
attributes within a structure, either separately, or together with structure checking. 

It is apparent thaf this method of validation and well-formedness checking can be 
applied to an input document in a separate process to the process for parsing of the 
document structure and content. Thus, for example, the method 600 can be optimised in 
order to achieve an efficient and high-speed validity and well-formedness check that can 
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be performed even in environments where central processing unit (CPU) cycles and 
memory size are not particularly subject to major constraints. The advantage of 
performing a separate check in such systems includes the fact that a highly optimized 
check can be used to quickly discard "invalid" documents. This can save considerable 
time and processing of at least part of an invalid document, thereby preventing, for 
example, (i) parsing of the document into a full DOM tree representation and (hen 
performing validation checking only to find that the document is invalid, or (ii) 
commencing further processing of a first (valid) part of a document prior to detecting an 
invalid second part of the document, the further processing of the first part of the 
document being thereby rendered futile. Another advantage is that after a document is 
discovered to be invalid using the fast validation check, processing of a following job can 
be immediately commenced. 

An "imperfect" hash process ie a hash process which is nol guaranteed to 
produce a unique numeral for each alphanumeric input string, can be adequate in certain 
cases, in particular where the maximum length of XML tag strings is constrained, or is at 
least constrained to some level of probability. Furthermore, in cases where the set of 
XML tag strings is constrained to some limited number of character permutations, or is 
constrained with some probability lo a limited number of character permutations, the 
imperfect hash process can be designed, or selected, to operate adequately. 

A communications standard, or alternative public or private forrnat(s) for 
numerical representation of a document structure can be defined or described based on the 
use of a hash algorithm. This technique allows a form of compression, which can be of 
benefit in transmission of XML data which normally involves transmission of a 
significant amount of data because of its verbosity, and human-readable ASCII form. 
Viarious options exist for retaining or discarding human-readability, for example by 
combining (perfect) hashing with other forms of compression, which are respectively 
applied to differing element types within an' XML file. For instance, it is possible to 
replace XML string tags with unique, human -readable numerals derived from a perfect 
hashing algorithm. Un -hashed syntactic and other elements can also be compressed by a 
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lossless compression technique for transmission between processes or devices, thereby 
reducing the amount of transmitted data. 

An inverse or reversible hash algorithm can be referenced or included where 
required as discussed in the previous paragraph. This is used where, for example, such an 
algorithm is needed to decode or decrypt one or more markup tags into a human -readable 
string for display or labelling purposes from a pre-hashed, transmitted markup document, 
where it is otherwise not necessary to do so for parsing and error-checking purposes. 
Another use of a reverse or inverse hash algorithm is to allow decryption of markup tags 
or other data lo enable a restricted function or feature relating to the transmitted markup 
document. Reverse or inverse algorithms can also be used for matching a transmitter and 
a receiver of markup documents, where the reverse or inverse hash algorithm is already 
included in the receiver, and is not transmitted, but might be referenced in the markup 
document Examples of reversible or invertible hash algorithms include (i) fully lossless 
encoding algorithms and (ii) Huffman encoding algorithms. 

The aforementioned arrangements can be applied to any markup language, with 
particular advantages where one or more of the following conditions apply, namely (i) the 
markup language allows definition of tag names (e.g. XML, DTD, CSS, XSL» etc), (ii) 
tag names use large character encoding tables (e.g. UTF-16) and/or tag name length is not 
typically shorter than the hashed representation thereof, (iii) the intended application 
using or receiving a markup document typically requires representation of complex 
structures with more than one hierarchical level of nesting within a markup document, 
XML Schema, or DTD, (iv) some form of checking, typically well-formedness or 
validation, is required for the input markup document, (v) the markup parser and/or 
application have strong Limitations on memory capacity (for example, embedded or low- 
cost CPU systems) or memory management (for example in systems having no virtual 
memory, or no dynamic memory allocation), and (vi) the markup parser and/or 
application need to operate quickly on potentially complex' highly-nested, markup 
documents. 

The disclosed method of parsing a markup language document can be 
implemented in dedicated hardware such as one or more integrated circuits performing 
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the functions or sub functions of parsing a markup language document. Such dcdicaLed 
hardware may include graphic processors, digital signal processors, or one or more 
microprocessors and associated memories. 

The method of parsing a markup language document can alternatively be 
practiced using a special purpose embedded computer system 400, such as that shown in 
Fig. 5 wherein the processes of Figs. 3(a), 3(b), 3(c), and 4 may be implemented as 
software, such as an application program executing within the embedded computer 
system 400. The computer system 400 is typically integrated (embedded) into an end 
system such as a printer (not shown) and drives a printer engine 402 in the printer. In 
particular, the steps of the method of parsing a markup language are effected by 
instructions in the software that arc carried out by the embedded computer. The software 
may be stored in a computer readable medium, including Read Only Memory (ROM) 418 
or Random Access Memory (RAM) 418 or other types of memory (not shown). The 
software is loaded into the embedded computer during manufacture, or by software 
upgrades performed on-site. 

The embedded computer system 400 comprises a computer module 410, input 
devices such as a switch module 422 for parameter setting, an output device such as a 
Liquid Crystal Display (LCD) showing job status, and the printer engine 402. The 
embedded computer 400 is typically physically integrated into the printer (not shown). 
Print jobs which originate at other computers (not shown) attached to a computer network 
406 are sent to the embedded computer 400 by a connection 404 to an Input/Output (I/O) 
interface 408. 

The embedded computer module 410 typically includes a processor unit 414, a 
memory unit 418, for example formed from semiconductor random access memory 
(RAM) and read only memory (ROM), input/output (I/O) interfaces including a switch 
module and LCD interface 416, and an I/O interface 408 for the printer engine 402 and 
network 406. The components 408, and 414 to 41 8 of the' embedded computer 410 
typically communicate via an interconnected bus 412 and in a manner which results in a 
conventional mode of operation of the embedded computer system 410 known to those in 
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the relevant art. Typically, the program of the arrangement is resident in memory 418. 
and is read and controlled in its execution by the processor 41 4. 

The method of parsing a markup language document can also be practiced using 
a conventional general-purpose computer system 500, such as that shown in Fig. 6 
wherein the processes of Figs. 3(a), 3(b), 3(c), and 4 may be implemented as software, 
such as an application program executing within the computer system 500. This 
application is useful, for example, when hashing is used as a communication standard 
across a network between computers. Fig. 6 shows only one of the communicating 
computers being considered. 

In particular, the steps of the method of parsing a markup language document are 
effected by instructions in the software that are carried out by the computer. The software 
may be divided into two separate parts, namely one part for carrying out the parsing 
methods, and another part to manage the user interface between the latter and the user. 
The software may be stored in a computer readable medium, including the storage 
devices described below, for example. The software is loaded into the computer from the 
computer readable medium, and then executed by the computer. A computer readable 
medium having such software or computer program recorded on it is a computer program 
product. The use of the computer program product in the computer preferably effects an 
advantageous apparatus for parsing a markup language document in accordance with the 
embodiments of the invention. 

The computer system 500 comprises a computer module 501, input devices such 
as a keyboard 502 and mouse 503, output devices including a printer 515 and a display 
device 514. A Modulator- Demodulator (Modem) transceiver device 516 is used by the 
computer module 501 for communicating to and from a communications network 520, for 
example conneclable via a telephone line 521 or other functional medium. The 
modem 516 can be used to obtain access to the Internet, other network systems, such as a 
Local Area Network (LAN) or a Wide Area Network (WAN), and the other* personal 
computer (PC) 522 with which the computer 500 is communicating.. 

The computer module 501 typically includes at least one processor unit 505, a 
memory unit 506, for example formed from semiconductor random access memory 
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(RAM) and read only memory (ROM), input/output (I/O) interfaces including a video 
interface 507, and an I/O interface 513 for the keyboard 502 and mouse 503 and 
optionally a joystick (not illustrated), and an interface 508 for the modem 516. 

A storage device 509 is provided and typically includes a hard disk drive 510 
and a floppy disk drive 511. A magnetic tape drive (not illustrated) may also be used. A 
CD-ROM drive 512 is typically provided as a non-volatile source of data. The 
components 505 to 513 of the computer module 501, typically communicate via an 
interconnected bus 504 and in a manner which resuLts in a conventional mode of 
operation of the computer system 500 known to those in the relevant art. Examples of 
computers on which the embodiments can be practised include IBM-PC's and 
compatibles, Sun Sparcstations or alike computer systems evolved therefrom. 

Typically, the .application program of the embodiment is resident on the hard 
disk drive 510, and is read and controlled in lis execution by the processor 505. 
Intermediate storage of the program and any data fetched from the network 520 may be 
accomplished using the semiconductor memory 506, possibly in concert with the hard 
disk drive 510. In some instances, the application program may be supplied to the user 
encoded on a CD-ROM or floppy disk and read via the corresponding drive 512 or 511, 
or alternatively may be read by the user from the PC 522 over the network 520 via the 
modem device 516. 

Still further, the software can also be loaded into the computer system 500 from 
other computer readable medium including magnetic tape, a ROM or integrated circuit, a 
magneto-optical disk, a radio or infra-red transmission channel between the computer 
module 501 and another device, a computer readable card such as a PCMCIA card, and 
the Internet and mtranets including email transmissions and information recorded on 
websites and the like. The foregoing is merely exemplary of relevant computer readable 
mediums. Other computer readable mediums may be practiced without departing from 
the scope arid spirit of the invention. 

Industrial Applicability 

It is apparent from the above that the embodimcnt(s) of the invention are 
applicable to the computer and data processing industries. 
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The foregoing describes only some embodiments of the present invention, and 
modifications and/or changes can be made thereto without departing from the scope and 
spirit of the invention, the embodiments being illustrative and not restrictive. 
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"Brief Description of the Drawings 

A number of preferred embodiments of Lhc present invention wiU now be 
described with reference lo the drawings, in which: 

Figs. 1(a) and 1(b) shows block representations of XML parser systems in which 
embodiments of the present invention can be practiced; 

Figs. 2(a) and 2(b) depict a flow chart of method steps for a prior an SAX parser, 
including optional well-formedness and/or validation checking steps; 

Figs. 3(a), 3(b) and 3(c) show an improved arrangement of the SAX parser of 
Figs. 2(a) and 2(b); 

Fig. 4 depicts a process for validating a document against a reference document 
such as a DTD, or an XML schema. 

Fig, 5 is a schematic block diagram of a special purpose embedded computer 
upon which an arrangement of the improved SAX parser can be practiced; and 

Fig. 6 is a general purpose computer upon which an arrangement of the 
improved SAX parser can be practiced. 
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1. Abstract 

A method of parsing a markup language document comprising syntactic 
dements is ^closed, said method comprising, for one of said syntactic elements trie 
steps of identifying (310) a type of the element, processing (3L8) the element by 
determining a hash representation thereof if said type is a first type, and augmenting (314) 
an at least partial structural representation of the document using the hash representation 
if said type is said first type. 



2. Representative Drawing 

Fig.3(a) 



